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INTRODUCTION 

As  a  consequence  of  the  development  of  high-speed  data 
processors,  groups  of  investigators  became  aware  of  the  diffi- 
culty of  supplying  data  to  high-speed  digital  computers  with  a 
speed  comparable  to  that  of  the  computer.  Unfortunately,  at  the 
present  stage  of  the  art,  all  data  collection  must  be  done 
through  the  human  channel  which,  while  being  amazingly  flexible 
and  complex,  is  inherently  slow  due  to  its  low  capacity.  In  this 
line  very  little  has  been  done  of  any  practical  consequence.  A 
few  attempts  to  recognize  time  pattern  can  be  listed,  such  as 
the  work  done  at  IBM,  London  University,  and  BTS  on  recognition 
of  sound. 

Most  of  the  several  approaches  that  have  been  investigated 
are  of  deterministic  nature  that  would  work  well  only  in  situa- 
tions where  the  group  of  signals  would  be  strictly  constrained 
to  be  of  a  deterministic  type.   As  a  result  of  this  situation, 
the  various  authors  have  been  forced  to  introduce  unbearable  re- 
strictions to  the  possibility  of  application  of  their  methods. 
Typical  examples  of  the  failure  to  which  these  endeavors  are 
doomed  are  the  BTS  digit  recognizer  that  had  to  be  regulated  to 
a  single  speaker,  and  the  recognizing  machine  developed  at 
London  University  that  systematically  failed  on  some  sound 
combinations. 

In  recent  years  a  new  point  of  view  has  been  formulated. 
This  point  of  view  may  be  summarized  by  stating  that  to  perform 
recognition,  redundancy  reduction,  and  noise  elimination,  one 
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must  deal  with  a  cognitive  system.  In  other  words,  only  a 
system  that  Is  able  to  learn  the  probability  distributions  of 
the  ensemble  on  which  it  is  operating  will  have  a  fair  chance 
to  succeed. 

Along  this  line  of  thought  we  find  the  contributions  due 
to  Allanson,  at  the  University  of  Birmingham,  Taylor,  at  London 
University,  and  Rosenblatt,  at  Cornell  Aeronautical  Laboratories. 

The  purpose  of  this  paper  is  to  investigate  and  evaluate 
the  theory  of  operation  of  a  machine  or  class  of  machines  called 
the  Perceptrons  originated  by  Dr.  Rosenblatt,  of  the  Cornell 
Aeronautical  Laboratories. 


BASIC  CONCEPTS  OF  STATISTICAL 
SEPARABILITY  THEORY 


The  Theory  of  Statistical  Separability  is  the  theory  of 
operation  of  a  system  called  the  Perceptron  which  operates  ac- 
cording to  certain  statistical  principles. 

The  system  is  so  designed  that  it  responds  to  a  statistical 
bias.  Information  is  stored  on  the  basis  of  retaining  that 
which  is  essential  to  the  classification  or  discrimination  of 
stimulus.   Associative  memory  is  employed  rather  than  exact  re- 
producibility of  remembered  materials. 

For  the  Perceptron  the  idea  of  memory  will  be  realizable  in 
quite  a  different  fashion  as  compared  to  the  digital  computer 
memory  system.   Representational  memory  employed  in  the  computer 
is  the  logically  translatable  coding  of  desired  information  to 


be  stored.   If  the  Perceptron,  for  example,  were  to  use  digital 
computer  memory  devices,  then  for  each  retinal  on--off  cell 
there  must  be  a  corresponding  storage  for  one  bit  of  informa- 
tion.  Then  for  a  million  on — off  retinal  cells  a  million  stor- 
age units  would  be  required.   Although  this  requirement  is 
realizable,  the  time  required  to  select  or  identify  the  storage 
unit  which  corresponds  most  nearly  to  each  new  input  would  fall 
quite  short  of  simulating  any  operation  comparable  to  that  of 
the  human  visual  performance.   Fortunately,  this  type  of  memory 
is  not  employed  in  the  Perceptron.   Rather  an  associative  memory 
is  used  to  identify  or  discriminate  Inputs.   Although  the  orga- 
nization of  the  Perceptron  will  be  discussed  later,  it  is  suf- 
ficient at  this  point  to  say  that  the  retinal  cells  are  connected 
at  random  to  a  set  of  cells  called  association  units.   Thus  any 
pattern  of  cells  stimulated  on  the  retina  would  activate  a  sub- 
set of  these  association  cells.   With  associative  type  of  memory 
the  information  content  is  contained  in  the  connection  patterns 
resulting  from  points  of  stimulation  on  the  retina  to  cells  of 
activity  in  the  associations  units. 

In  place  of  the  idea  of  errorless  retention,  redundancy  is 
employed  in  the  use  of  the  same  associate  units. 

The  system  will  occasionally  make  errors  in  identification 
of  a  pattern  which  has  been  correctly  identified  before,  not  be- 
cause of  malfunctioning  of  the  electronic  hardware,  but  because 
the  system  operates  in  a  probabilistic  manner.   Since  the  nature 
of  the  system  is  statistical,  the  probability  of  correct  recog- 
nition fluctuates  with  time.   That  is,  the  adapting  of  the 


system  to  its  inputs  is  a  function  of  time.   Learning  takes 
place,  and  then  the  system  is  said  to  adapt  to  its  environment. 
Thus  it  follows  that  the  statistical  bias  which  determines  the 
proper  response  will  change  with  time. 

The  principle  for  connections  is  essentially  random  within 
limitations  of  the  plan  of  organization.   In  an  analogous  manner 
the  biological  nervous  system  is  assumed  to  have  entire  freedom 
in  the  details  of  connections. 

According  to  biological  nervous  system  theories,  a  system 
spontaneously  adapts  to  its  environment  by  two  possible  methods. 

In  one  theory  a  system  learns  or  adapts  to  Its  environment 
by  change  in  network  topology.  As  the  nervous  system  adapts  to 
its  environment,  neuron  connections  or  branches  of  the  neuron 
network  continually  change  their  topology. 

The  other  theory  assumes  that  a  system  adapts  to  its  en- 
vironment by  changing  a  value  function  associated  with  the  neu- 
rons. The  network  once  established  (upon  birth)  remains  con- 
stant throughout  the  system's  entire  life  and  learning  is  accom- 
plished by  changes  of  some  parameters  of  the  neuron  composition. 

The  latter  theory  of  learning  is  the  basis  of  Dr.  Rosen- 
blatt's Perceptron  Theory. 

ORGANIZATION  OP  THE  PERCEPTRON 

The  basic  organization  of  the  Perceptron  will  consist  of  a 
sensory  unit,  two  response  units,  R^  and  R2,  and  their  asso- 
ciated source  sets,  A^  and  Ag,  respectively.   The  relatively 


simple  model  shown  will  be  capable  of  a  limited  vocabulary;  how- 
ever, it  will  serve  to  illustrate  the  basic  principles  of  the 
function  and  organization  of  the  Perceptron.   One  method  of 
pictorial  representation  of  the  basic  organization  is  by  use  of 
the  Venn  diagram,  Plate  I,  Pig.  1.   The  circles  represent  sets 
or  classes  of  units,  and  the  arrowed  lines  indicate  directional 
excitatory  connections  of  the  various  sets  of  units.   The  lines 
terminated  by  small  circles  indicate  inhibitory  connections. 
Figure  2,  Plate  I,  is  a  schematic  representation  corresponding 
to  Fig.  1,  Plate  I. 

Now  consider  the  laws  or  rules  which  govern  the  connections 
between  the  different  sets  of  units  of  the  Perceptron.   The  net- 
work of  connections  between  S-  and  A-units  is  one  of  uniform 
random  distribution.   That  is,  any  S-point  may  be  connected  to 
any  A-unit  with  equal  probability.   Each  S-point  may  be  connected 
to  several  A-units  distributed  uniformly  over  the  entire  A-set. 
Each  A-unit  will  have  several  S-points  connected  to  it.   These 
S-points  are  called  the  origin  points  of  an  A-unit.   In  the 
simplest  Perceptron  the  origin  points  are  uniformly  distributed 
at  random  throughout  the  S-set.  However,  in  order  for  the  Per- 
ceptron to  have  sensitivity  to  contours  and  gradients,  the 
origin  points  for  a  single  A-unit  must  be  concentrated  in  a  small 
area  such  as  an  exponential  distribution  about  a  central  point. 

The  A-units  are  connected  to  the  R-units  at  random,  similar 
to  that  of  the  S-points  and  A-unit  connections.  In  general, 
this  connection  results  In  three  A-subsets.   One  subset  will  be 
those  A-units,  denoted  by  A^  set  or  R^  source  set,  transmitting 


EXPLANATION  OF  PLATE  I 

Fig.  1.  A  Venn  diagram  of  the  organization  of  a 
simple  Perceptron. 

Fig.  2.  A  schematic  representation  corresponding  to 
Fig.  1. 
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to  R^  response  unit.   The  second  subset  consists  of  those  A-units 
(A2~set  or  Rg-source  set)  connected  to  the  response  unit  R2» 
The  other  is  a  small  subset  of  overlapping  A-units  which  is  con- 
nected to  both  Ri  and  R£  response  units. 

In  Pig.  1,  the  A^set  is  shown  to  be  in  the  upper  circle, 
the  A2-set  the  lower  circle,  and  the  intersection  of  the  circles 
represents  the  overlapping  set  of  A-units.   There  is  no  need  for 
topographical  segregation  of  the  R-source  sets  in  the  actual 
system  as  the  diagram  was  drawn  in  this  manner  for  clarity. 

As  illustrated  in  the  diagram  of  Pig.  1,  the  response  units 
are  mutually  exclusive,  that  is,  when  Ri,  for  example,  responds 
to  a  stimulus,  it  sends  inhibitory  impulses  to  A-units  of  A2-set 
and  to  the  other  response  unit  Rg»  Thus  the  rule  of  connection 
is  that  each  R-unit  inhibits  the  complement  of  its  source  set. 
When  one  R-unit  has  responded,  it  suppresses  the  other  source - 
sets  limiting  the  activity  to  the  dominant  A- set.   The  inhibitory 
impulses  will  prevent  the  non-dominant  responses  from  being 
activated  by  impulses  from  the  intersections  of  this  source-set 
with  the  source-set  of  the  dominant  set. 

Consider  what  happens  upon  presentation  of  the  first  stim- 
ulus to  the  sensory  points  of  the  Perceptron.   A  subset  of  uni- 
formly distributed  members  of  the  A-system  will  respond.   This 
set  of  points  which  is  activated  is  the  superset  responding  to 
the  presented  stimulus.   At  this  point  probably  no  R-unit  will 
respond  since  the  activated  set  of  A-units  is  uniform  in  all 
source-sets.   A  response  unit,  say  Ri,  will  be  forced  to  respond 
by  the  experimenter.   Then  this  response  unit  suppresses  the 
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other  R-source  sets.   That  is,  the  members  of  the  original  super- 
set are  inhibited  except  those  of  R^  which  become  the  dominant 
subset.   The  dominant  R^-subset  consists  of  the  active  units  of 
the  Ri-source  set  which  responds  to  a  particular  stimulus  asso- 
ciated with  R^.  Upon  activation  the  active  units  of  R^  gain 
value  with  respect  to  the  rival  subsets.   Since  discrimination 
is  based  on  the  net  value  of  the  source-sets,  then  with  increas- 
ing number  of  stimuli  presented  of  this  type  the  higher  the 
probability  that  R^  will  respond  autonomously.   Similarly,  with 
a  different  type  of  stimulus  presented,  R2  may  be  forced  to  re- 
spond.  When  R2  responds  it  inhibits  the  other  source-sets; 
hence  they  are  unable  to  gain  value,  and  only  the  dominant  R2 
subset  (activated  units  of  Rg)  are  allowed  to  gain  value  with 
respect  to  its  complementary  set. 

The  more  presentations  of  the  type  of  stimulus  associated 
with  Rg,  the  higher  the  value  of  R2  due  to  stimulus  S^_  and  the 
higher  the  probability  of  correct  response  of  R2» 

In  the  biological  nervous  system  there  are  three  classes 
of  cells:   sensory,  associative,  and  motor  neurons.  Correspond- 
ing to  the  biological  system,  the  Perceptron  has  three  elementary 
units  which  are  the  following:   S-points  (sensory  points  in  a 
simulated  retina),  A-units  (association  units),  and  R-units 
(response  units). 

The  sensory  units  receive  the  stimuli  whatever  they  may  be. 
For  example,  in  the  photoperceptron  the  stimulus  will  be  pro- 
portional to  the  level  of  illumination.   The  response  units  may 
be  considered  the  code  center  or  a  label  of  a  particular  class 
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of  stimulus. 

The  activity  of  the  Perceptron  upon  the  presentation  of  a 
stimulus  will  be  divided  into  two  classes,  the  predominant  and 
the  postdominant  phases. 

The  predominant  phase  is  only  a  transient  phenomenon.   When 
a  stimulus  has  been  shown  to  the  sensory  system,  a  certain  num- 
ber of  A-units  are  activated.   Some  of  these  activated  A-units 
will  be  members  of  both  source-sets.   The  source-set,  say  Rj_, 
for  example,  which  contains  the  largest  number  of  activated 
units  will  tend  to  have  a  higher  net  value  than  the  other  set. 
Thus  Ri  will  tend  to  respond.  As  R^  responds,  it  suppresses 
the  R2~source  set  and  the  Ra-response  unit.   The  above  procedure 
takes  place  in  a  very  short  amount  of  time  and  is  essentially 
a  transient  phenomenon. 

Once  a  response  unit  has  responded  and  the  complement  set 
has  been  suppressed,  then  the  Perceptron  is  In  the  postdominant 
phase  of  its  activity.   During  this  phase  the  resulting  unsup- 
pressed  activated  A-units  gain  an  increment  of  value,  while  the 
inactive  A-units  remain  unchanged.   It  is  evident  that  the  next 
time  the  same  stimulus  is  presented,  the  same  reinforced  A-units 
will  be  reactivated  with  a  higher  probability,  and  thus  they 
will  indicate  the  correct  response.   All  of  the  above  is  a  de- 
scription of  the  reaction  of  the  Perceptron  to  a  presented 
stimulus. 

The  detailed  analytical  description  of  the  predominant 
phase  of  Perceptron  response  was  carried  out  by  Dr.  Rosenblatt 
and  given  in  the  Report  on  "A  Theory  of  Statistical  Separability 
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In  Cognitive  Systems". 

Two  variables  Pa,  the  expected  proportion  of  A-units  acti- 
vated by  a  particular  stimulus  of  a  given  size,  and  Pc,  the  ex- 
pected proportion  of  A-units  activated  by  one  stimulus  which 
are  also  activated  by  another  stimulus,  are  sufficient  to  de- 
scribe the  predominant  phase  of  the  Perceptron. 

The  numerical  evaluation  of  the  equations  for  Pa  and  Pc 
were  obtained  by  a  Monte  Carlo  computation  technique  on  the 
IBM  704  computer.  The  equations  for  Pa  and  Pc  are  essentially 
functions  of  the  parameters  of  the  Perceptron  organization. 

The  connections  of  the  system  consist  of  random  homogeneous 
distribution  of  connections  between  the  S-  and  A-units.   Each 
A-unit  receives  some  excitatory  connections  and  may,  but  not 
necessarily,  receive  some  inhibitory  connections  from  the  sensory 
cells.   The  only  restraint  on  the  design  of  connections  is  that 
no  two  A-units  are  connected  to  identical  sets  of  S-points. 
This  restriction  is  placed  so  as  to  insure  maximum  difference 
in  response  of  the  system  to  different  stimuli. 

When  any  stimulus  is  presented  to  the  sensory  mosaic,  a 
set  of  S-points  is  stimulated.   The  S-points  are  connected  to 
the  A-units  by  excitatory  and  inhibitory  connections.   If  a  suf- 
ficient number  of  net  excitatory  connections  to  an  A-unit  are 
excited  by  the  stimulus,  then  the  A-unit  is  activated.   That  is, 
if  an  A-unit  receives  a  net  amount  of  excitation  greater  than 
or  equal  to  the  threshold  value,  then  that  A-unit  is  to  respond 
or  become  active.   Pfl  and  Pc  are  functions  of  the  formulation  of 
possible  combination  of  excitatory  and  inhibitory  connections 
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and  various  levels  of  threshold  values  for  the  A-units.  Hence 
the  analysis  concerning  these  quantities  is  essentially  one  of 
design  possibilities  of  the  Perceptron  system. 

This  report  will  not  be  concerned  with  the  aspect  of  the 
Perceptron  analysis  mentioned  above,  but  it  will  be  concerned 
with  the  feasibility  of  such  a  system  for  its  intended  purpose, 
that  of  learning  its  environment. 

ALTERNATE  PERCEPTRON  MODELS 

With  the  general  statistical  separability  theory  and  the 
rules  of  organisation  given,  several  alternative  Perceptrons 
are  possible.   On  the  basis  of  response  unit  discrimination 
there  are  two  possible  forms,  the  sum  value  and  the  mean  value 
systems.   In  the  sum  value  system,  discrimination  of  the  response 
units  is  based  on  a  comparison  of  the  total  value  of  each  A- 
subset  (the  set  of  active  A-units  per  source-set). 

Discrimination  by  the  mean  value  system  is  the  comparison 
of  the  mean  value  over  the  sets  of  active  A-units.   That  is,  an 
average  is  taken  over  each  entire  active  subset,  and  the  result 
is  the  mean  value  per  active  A-unit  per  subset,  and  the  compari- 
son for  discrimination  is  made  between  the  source-sets.   The  sum 
discriminating  and  the  mean  discriminating  systems  will  be  de- 
noted by  ^--systems  and  ^c^  -systems,  respectively. 

Three  alternative  Perceptron  models  will  be  considered  with 
respect  to  the  dynamics  of  the  value  change  of  each  source  set. 
One  model  is  the  uncompensated  gain  system  called  the  Alpha 
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Perception.  Each  A-unit  gains  an  increment  of  value  per  unit 
of  time  the  cell  is  activated.  When  an  A-unit  is  inactive  or 
suppressed,  it  remains  at  a  constant  value  which  is  determined 
by  the  number  of  reinforcements  previously  received.   Thus  the 
total  value  gain  of  a  source-set  per  reinforcement  is  equal  to 
the  number  of  activated  A-units  per  source-set.   The  mean  value 
of  the  A-system  increases  with  the  number  of  reinforcements. 
This  system  has  the  advantage  of  being  easy  to  design.   How- 
ever, it  must  operate  under  the  restricted  conditions  that  each 
response  unit  on  the  average  is  reinforced  or  becomes  dominant 
with  equal  frequency.  In  the  random  environment  the  probability 
of  correct  response  decreases  to  a  random  or  chance  expectancy 
of  0.5  when  Ns,  the  number  of  stimuli  presented  to  the  system, 
becomes  large  enough.   The  system,  under  these  conditions,  be- 
comes saturated. 

The  results  of  the  analysis  using  mean  discrimination  for 
the  Alpha  system  show  that  the  performance  is  improved  for 
higher  values  of  ns.   In  addition,  the  range  of  values  of  Pa  for 
which  the  system  operates  satisfactorily  is  widened. 

The  second  model  is  the  constant-feed  system  which  is  called 
the  Beta  Perceptron.   Independently  of  the  number  of  reinforce- 
ments, a  constant  rate  of  value  Is  fed  to  each  source-set  of 
the  A-system.   Hence  the  total  value  of  all  source-sets  is 
always  equal. 

Within  the  source-set  the  active  units  take  precedence  over 
the  inactive  units;  thus  the  value  gain  is  distributed  prefer- 
entially to  the  active  units  of  each  source  set.   The  total  value 
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gain  of  a  source-set  per  reinforcement  is  a  constant,  K,  and  the 
mean  value  of  the  A-system  increases  with  time.  An  A-unit  active 
for  one  unit  of  time  gains  105^,  where  N^  is  the  number  of 
active  units  in  a  source-set.   The  gain  of  an  inactive  A-unit 
outside  the  dominant  set  is  K/NAr,  where  NAr  equals  the  number 
of  A-units  connected  to  a  response  unit,  while  the  gain  of  an 
inactive  A-unit  of  the  dominant  set  is  zero. 

The  analysis  of  the  Beta  system  has  poorer  performance  than 
the  Alpha  system  under  all  conditions,  even  with  variation  in 
n3r,  the  number  of  stimuli  associated  to  each  response  unit. 
The  reason  is  the  accumulation  of  value  in  the  inactive  units. 

The  parasitic  gain  system,  or  Gamma  Perceptron,  is  the  third 
Perceptron  model  that  will  be  considered.  The  total  value  as 
well  as  the  mean  value  of  each  source-set  remains  constant.  Re- 
inforcement produces  only  the  effect  of  redistribution  of  the 
value  among  the  A-units  of  a  source-set.  Within  a  source-set 
active  A-units  gain  value  at  the  expense  of  inactive  A-units, 
which  decrease  in  value. 

Continuing  the  comparison  of  logical  characteristics  of  the 
three  systems,  the  total  value  gain  of  the  source-set  per  rein- 
forcement is,  of  course,  zero.  An  A-unit  active  for  one  unit 
of  time  gains  one  increment  of  value.   The  inactive  A-units  out- 
side of  the  dominant  set  gain  zero  value,  while  the  inactive 

Nar 
A-units  of  the  dominant  set  loses increment  of  value. 

NAr  -  Nar 
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ANALYSIS  OF  TEE  ALPHA  PERCEPTRONS 
FOR  IDEAL  ENVIRONMENT 


Response  of  the  Alpha  Systems  with  Uniform  nSr 

The  performance  of  the  Perceptron  will  be  analyzed  quanti- 
tatively on  the  basis  of  a  hypothetical  experiment.   The  exper- 
iment consists  of  a  learning  period  and  a  testing  period,  during 
which  time  the  capabilities  of  the  machine  will  be  evaluated. 

During  the  learning  period  a  specified  number  of  stimuli, 
n3,  will  be  shown  to  the  Perceptron.   The  experimenter  will  force 
each  of  these  stimuli  to  become  associated  with  one  of  the  re- 
sponses by  forcing  the  desired  response  unit  to  respond.   The 
stimuli  for  ideal  environment  each  consists  of  a  random  collec- 
tion of  S-points  to  be  stimulated.   The  stimuli  will  have  the 
same  measure,  that  is,  each  consists  of  the  same  number  of  S- 
points.  It  will  be  assumed  that  on  the  average  an  equal  number 
of  stimuli  are  associated  to  each  response  unit.  In  symbols 
nsi  stimuli  are  associated  during  the  learning  period  to  re- 
sponse R^. 

In  the  testing  period  spontaneous  reaction  of  the  system  to 
the  previously  reinforced  stimulus  s^will  be  observed.   Correct 
response  is  achieved  if  the  testing  and  learning  period  re- 
sponses are  the  same. 

General  considerations  will  now  be  given  to  the  analysis 
of  Pr,  which  is  the  probability  of  correct  response  during  the 
testing  period  to  stimuli  previously  reinforced  during  the 
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learning  period.  Upon  the  presentation  of  a  stimulus,  the  dis- 
crimination of  the  response  units  will  be  measured  by  the  rela- 
tive difference  between  the  value  of  the  source-sets.   Thus  the 
net  bias  B  will  be  referred  to  as  the  net  difference  of  value 
between  R^  and  Rg  source-sets  which  results  from  a  stimulus  be- 
ing presented.   A  convention  to  be  used  in  this  analysis  is  that 
only  two  response  units  will  be  assumed.   However,  the  same 
analysis  is  valid  for  any  number  of  response  units.   Under  these 
conditions,  if  B  is  positive  R-^  will  be  preferred,  and  if  B  is 
negative  Rg  will  be  preferred. 

The  net  bias  3  can  be  decomposed  into  two  bias  components-- 
b,  the  controlled  bias,  and  d,  the  random  bias.   The  controlled 
bias  b  is  the  value  gained  by  R^  source-set  due  to  stimulus  S^ 
associated  with  R,  when  it  was  originally  presented  during  the 
learning  period.  The  random  bias  d  is  the  net  value  between  R^ 
and  Rg  source-sets  due  to  all  stimuli  other  than  S^. 

Extensive  use  of  statistical  parameters  will  be  made 
throughout  this  report  and  as  each  parameter  is  needed  it  will 
first  be  introduced  in  general  statistical  notation  with  the 
proper  explanation.   Then  the  application  to  the  particular 
problem  will  be  made. 

The  arithmetic  mean  of  a  distribution  is  the  sum  of  the 
products  of  the  values  and  their  corresponding  proportions.  The 
arithmetic  mean  is  also  called  the  expected  value  of  a  member  of 
the  population  to  be  chosen  at  random.  If  X  is  to  denote  a  mem- 
ber of  the  set  to  be  chosen  at  random  and  E  denotes  the  expected 
value,  then  E(X)  means  the  expected  value  of  a  member  of  the  set 
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chosen  at  random.   The  arithmetic  mean  is  the  center  of  gravity 
of  a  distribution  since  the  sum  of  deviations  from  E(X)  is  zero. 
It  should  be  noted  that  the  expected  value  of  a  quantity  X,  for 
example,  may  be  denoted  by  either  E(X) ,  or  X\   Both  notations 
will  be  used  in  this  report. 

Expressing  the  above  two  bias  components  as  expected  values 
plus  their  fluctuations,  the  following  definitions  result: 

b  m   the  expected  bias  gained  by  the  R^  3ource-set  by  the 

reinforcement  due  to  the  stimulus  in  question,  s^. 
d*  as  the  expected  bias  gained  due  to  all  reinforcements 
of  the  R^  and  Rg  source- sets,  exclusive  of  s^ 
At>   =  the  difference  between  the  actual  value  of  b  from 

the  expected  value  of  b 
Ad  =  the  difference  of  the  actual  value  of  the  random 
bias  d,  from  the  expected  value  of  d. 
In  terms  of  the  above  components,  the  net  bias  may  be  ex- 
pressed by 

B  =  b"+d+Ab+  Ad  (1) 

For  correct  response  of  a  particular  stimulus,  B  must  be 
positive  for  the  stimulus.   Therefore 

b"  +  d"  +  4  b  +  ,Ad  ;>  0 
or  b"  +  d"  y-  -(Ah   +  A\&) 

which  indicates  that  the  sum  of  the  expected  bias  must  be  greater 
than  the  fluctuation  bias  for  correct  response  to  occur. 

The  performance  of  the  Perceptron  systems  may  be  measured 
by  the  correctness  of  response  due  to  any  particular  stimulus 
in  question.   This  is  measured  by  Pr,  the  probability  that  when 
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one  stimulus  of  a  class  of  stimuli  associated  with  R^ -response 
unit  is  presented  during  the  learning  periods,  this  stimulus 
will  be  preferred  over  any  particular  response  Rj  in  the  testing 
phase. 

From  the  previous  considerations  of  the  biases  of  the 
system,  Pr  would  be  directly  proportional  to  the  net  expected 
bias  and  inversely  proportional  to  the  standard  deviation  of  the 
bias  components  b  and  d.  It  is  evident  that  Pr  would  be  a  func- 
tion of  the  expected  bias.   However,  this  quantity  must  be  nor- 
malized with  respect  to  the  standard  deviation  of  the  bias  com- 
ponents denoted  by  <T~(b  +  d) . 

In  notational  form, 

b  +  d 

CT(b  +  d) 
Pr  =  J  f(X)  dx  (2) 

where  f  is  some  suitable  distribution  function. 

Ab  and  Ad,  the  error  components  of  the  bias,  are  not  mu- 
tually independent  because  both  components  are  functions  of  P^i* 
the  actual  proportion  of  A-units  activated  by  the  i   stimulus. 

Thus  the  standard  deviation  of  (b  and  d)  is  difficult  to 
evaluate.   However,  for  a  fixed  value  of  h  b,  (T~(d)  could  be 
calculated  and  the  probability  that  the  proper  bias  conditions 
would  exist  could  be  expressed  by 

b  +  d  +  Ab 

0(Z)  dZ 


19 

where#  is  a  suitable  distribution  function  depending  on  A  b. 

Now  if  the  sum  of  all  possible  Ab  were  calculated,  Pr  may 
be  written  as  follows: 

•b  +  d" 


-L 


Ab 


^d 


0(Z)dZ 


-oo 


P(Z\b) 


(3) 


where  P(Ab)  is  the  frequency  of  occurrence  of  Ab. 

In  order  to  simplify  this  expression,  consider  the  quantity 
^b.   Ab  is  the  error  component  of  bias  due  only  to  the  stimu- 
lus in  question,  the  response  of  which  is  measured  by  Pr. 

For  the  mean  discriminating  systems  and  the  sum  system  with 
a  large  number  of  A-units,  Ab  is,  in  general,  small  compared  to 
/\&.      However,  there  is  one  condition  which  could  make  a  criti- 
cal difference  in  Pr  if  Ab  was  entirely  neglected,  and  that  is 
when  Ab  =  -b.   This  indicates  that  when  Sj.  is  presented,  then 
no  A-unit  in  the  R±   source-set  will  be  activated.   For  all  other 
conditions,  Ab  can  be  neglected.   The  above  sum  reduces  to  one 
term  which  is 

F+  d* 


pr  = 


r 


*d 


uU 


0(Z)dZ 


P(Ab  t   -b) 


(4) 


The  most  logical  choice  for  the  distribution  function,  0, 
would  be  to  assume  a  normal  distribution  function  in  view  of 
the  central  limit  theorem.   Then  Pr  would  be  the  normal  distri- 
bution integral  times  the  corrective  factor  P(Ab  ^  -b). 


The  expression  for  Pr  becomes 
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pr  sp(Ab^  -b) f    £'t2/2  dt 

b"  +  d" 

where  Z  = (5) 

(T(d) 

In  the  first  analysis  the  study  of  the  behavior  of  the 
system  in  an  ideal  environment  will  be  carried  out.   Ideal  en- 
vironment is  a  simplification  of  the  theoretical  model,  presented 
in  order  to  simplify  analysis,  rather  than  an  optimum.   The 
important  feature  of  ideal  environment  is  that  it  simplifies  the 
stimulus  relationship  associated  to  each  response  unit.  Under 
this  condition,  each  stimulus  of  the  set  of  stimuli  associated 
with  a  response  unit  has  no  correlation  or  relationship  of  any 
kind  to  any  other  stimulus  of  the  same  class.   Another  assump- 
tion is  that  all  stimuli  are  of  the  same  measure  so  that  Pa  will 
be  identical  for  all  stimuli. 

The  frequency  of  activation  of  the  A-units  will  determine 
the  bias  of  the  source-sets  and  In  turn  the  responses  to  be  acti- 
vated, thus  determining  correct  recognition.   With  this  in  view 
very  careful  consideration  must  be  made  with  respect  to  the  de- 
tails of  the  activity  of  the  A-units  during  exposures. 

Let  the  following  notation  be  introduced: 

Pi  a  the  probability  the  i**1  stimulus  will  be  pre- 
sented to  the  system 

pAi  s  the  probability  that  an  A-unit  will  be  activated 
by  stimulus  i. 

The  A-units  are  connected  at  random  to  the  sensory  system 
and  the  R-units.   The  expected  value  of  P^  is  the  product  of 
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the  values  of  PA,  and  their  corresponding  frequency  of  occur- 
rence Pj_. 

In  symbols,      E(PAi)  =  ZT  PAi  PI  <6) 

Since  E(PA. )  will  be  used  quite  frequently,  the  following 
notation  is  used:   Pa  ■  E(PA.).  Several  interpretations  of  Pa 
can  now  be  projected,  keeping  in  mind  that  Pa  is  an  expected  or 
mean  value. 

The  most  obvious  meaning  is  that  Pa  is  the  probability  that 
any  randomly  selected  A-unit  in  the  entire  A- system  will  respond 
to  a  stimulus  in  question.  It  follows  from  this  general  defini- 
tion that  Pa  is  the  proportion  of  A-units  which  will  respond  to 
a  particular  stimulus. 

If  a  particular  stimulus  has  activated  an  A-unit,  it  will 
gain  an  increment  of  value  AY   which  has  been  set  equal  to  unity. 
Then  a  final  interpretation  is  that  Pa  is  the  expected  value  of 
a  proportion  of  exposures  on  which  an  A-unit  will  gain  an  incre- 
ment of  value.   In  other  words,  Pa  is  the  expected  increment  of 
value  on  the  average  that  an  A-unit  will  gain  due  to  one 
exposure. 

Many  quantities  In  this  analysis  are  expressible  as  a  func- 
tion of  the  random  variable  PA  .  It  is  useful  to  measure  the 
amount  of  variation  in  the  value  among  the  members  of  a  popula- 
tion.  One  of  the  most  frequently  used  measures  of  variability 

is  variance,  and  its  positive  square  root,  the  standard  devia- 

o 
tion  denoted  by  <j"  and  <r~,  respectively. 

In  order  to  evaluate  the  variance  of  PA.,  assume  for 

C7"&(PAi)  a  series  expansion  of  the  type 
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<T-2<PAl)  =  it    *e  C  <7> 

e=0 

In  practice,  the  powers  of  Pa  higher  than  the  second  can  b  e  neg- 
lected since  in  most  of  the  following  considerations  Pa«l. 
Thus  it  can  be  assumed: 

<r2(PAi)  -  «o  +  alpa  +  a2pa2  <8) 
But  noting  that  Pa  =  0  if  and  only  if  PA  ■  0  for  all  i,  then 

q(Pa,  SB  0)  ,   x 

0-2JP^=  0  j  »  0  =  a0  (9) 


Therefore      o"*(P# 


2 


•Al)  =  axPa  ♦  a2Pa^  (10) 

But  Pa  =  1  if  PA  m   i  for  all  i.   Thus 

Pa=  1  implies  cr2(PAl)  m   0 
or  a-]_  +  &2   ■  0,  a2  ■  -a^  (11) 

and  <7"2(Pa1)  ■  »l(pa  •  pa2>  (12> 

This  being  a  variance  of  a  population  of  probable  numbers,  its 

value  cannot  exceed  1;  hence 


This  coincides  with  the  value  given  in  reference  (1)  without 


<T*{?A    )   =  Pa(l  -   Pa)  (13) 


Justification.      The  above  derivation  indicates  that  this  is  the 
only  feasible  second  order  approximate  of 

<r2(PAl)   ■  f(Pa)  (14) 

Analysis  for  random  environment  is  carried  out  because  the 
analytical  model  used  will  serve   as  a  basis   of  analysis   for   the 
modifications   and  extensions   of  more   sophisticated  Perceptrons. 
For  calculation  of  Pr  as   a  function  of  Pa,    the  quantities 
which  appear  in  the  expression  for  Pr,   namely,   b,    d,    and   <r~&, 
will  now  be  calculated  for  the    sum  discriminating  Alpha  system. 
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Assuming  non- over lapping  source-sets,  then  the  expected 
controlled  bias  b  will  be  equal  to  the  number  of  A-units  acti- 
vated in  a  source- set  by  a  stimulus  times  the  increment  of  value 
gained  by  an  A-unit  upon  activation,  which  can  be  represented 
by  Na  AV.   AV  is  assumed  to  be  unity. 

However,  since  overlap  exists  between  source-sets,  the 
effective  value  gained  by  a  source-set  is 

E*iTar-irac  (15) 

where  Na  «  expected  number  of  common  units  activated  by  the 
stimulus  in  question. 

!**  Nari»  NAr»  *nd  Nftr  be  defined  as  follows: 

Na   =s  the  number  of  active  A-unlts  per  source-set  when 
the  itlJ-  stimulus  is  presented 

N^   as  the  number  of  A-units  connected  to  the  response 

J 

Rj,  or  in  general 
Na   ss  the  number  of  A-units  connected  per  response  unit, 

since  the  variance  of  NA   is  considered  negligible. 
Then  Nari  -  NAp  ?A±  (16) 

The  expected  value  of  Na  may  now  be  calculated  as  follows: 

E(Nar)  =  2"  NariPi  =  2"  NAp  PAi  Pi 


Z 


=  NAr  U     PAi  Pi  .  MAj|  Pft  (17) 

Similarly,       E<Nac)  ■  pa  NA(J 
where  NAfi  =  the  number  of  A-units  connected  in  common  to  Ri  and 
Rj,  a  specified  pair  of  response  units 
Nmc  as  the  number  of  A-units  active  in  the  NA  subset. 
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Substituting  in  the  expression  for  the  expected  controlled 
bias  yields 

E-  Pa(NAr  -  NAc) 
Since  NA  -  NA  is  the  number  of  effective  A-units  connected  to 
a  source-set,  denoted  by  Ne,  then  b  ss  PaNe. 

An  experiment  will  be  assumed  in  which  the  following  con- 
ventions will  be  used.   S^.  has  been  selected  to  represent  a 
known  stimulus  which  will  be  used  as  a  test  stimulus.   There  is 
nothing  special  about  this  stimulus  except  that  it  has  been 
chosen  to  represent  any  particular  stimulus  of  the  stimulus 
class  associated  to  the  R^  source-set. 

It  is  assumed  that  the  number  of  stimuli  associated  to  a 
response  unit,  ns  ,  are  all  equal.   For  the  sake  of  calculation, 
the  discrimination  between  stimuli  belonging  to  response  units 
Rl  and  R2  will  be  of  concern,  with  S^  representative  of  the  R^ 
stimuli  class. 

The  net  bias  is  to  measure  the  net  value  gained  by  the 
source-sets  upon  activation.   The  expected  net  bias  is  a  measure 
of  the  net  difference  of  value  between  the  R^  and  Rg  source- sets 
due  to  stimuli  reinforcements. 

When  any  one  stimulus  is  presented  to  the  Perceptron  in 
particular  St,  the  expected  value  gained  by  the  source-set  which 
responds  is  equal  to  the  number  of  effective  units  activated 

^a_  "  ^a  *  or  PaNe*   By  definitlon>  fcnls  value  is  the  expected 
controlled  bias  b.   Since  S^  will  be  assumed  to  be  associated 
with  Ri,  then  the  PaNe  units  activated  by  S^  form  a  set  of  units 
which  will  be  called  the  StR1  subset.   Then  stimuli  other  than 
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Sjj  associated  to  R^  and  R2  presented  during  the  learning  period 
activate  a  portion  of  the  S-j-R^  subset  which  tend  to  reinforce 
the  S.J.R-L  subset.   Thus  the  result  is  to  increase  the  probability 
for  correct  response  of  St  during  the  test  period.   This  over- 
lapping bias  reinforcement  is  measured  by  the  random  bias  com- 
ponent, d. 

The  expected  proportion  of  overlap  of  A-units  to  two 
stimuli  is  Pa  for  random  environment.   Then  the  expected  bias  d" 
at  the  end  of  the  learning  period  due  to  all  stimuli  belonging 
to  R^  and  R2  other  than  S^,  is  equal  to 

3  .  Vx  .  V2  »  Pa(PaNe)(nar  -  1)  -  Pa(PaNe)nsr 

=  -Pa(PaNe)  (18) 

where  7X  a  Pa(PaNe)(nSr  -  1)  is  the  expected  value  of  the  Rx 
source-set  due  to  all  stimuli  associated  with  E-^   except  St,  and 
^2  =  pa(paNe)nsr  is  tne  expected  value  of  the  R2  source-set  due 
to  all  stimuli  of  the  R2  class. 

The  second  quantity  required  for  the  Pr  expression  is  the 
standard  deviation  of  d,  <rd,  which  is  defined  as  the  positive 
square  root  of  the  variance  of  d,  cT^2. 

The  error  or  random  bias  component,  d,  is  given  by 

i  =  »J  -  V2  (19) 

where  V^  is  the  total  value  of  the  Na.  units  in  the  R^  source- 
set  and  V2  is  the  total  value  of  the  Na2  units  of  the  R2  source- 
set  at  the  time  when  St  is  presented,   d  is  the  net  bias  at  the 
time  when  St  is  presented  or  the  bias  due  to  all  other  rein- 
forcements of  R^  and  R2  other  than  St.   V^  is  produced  from 
nsr  -  1  stimuli,  other  than  St,  associated  with  R^,  and  V2  is 
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due  to  n8r  stimuli  which  were  associated  with  R£  source-set. 

The  total  value  of  either  source-set  R^  or  R2  taken  over 

all  the  active  A-units  which  respond  to  S^  is  given  by 

Nar 
Vr  =  Z   vU«)  (20) 

where  v(ai)  *  the  value  of  the  a*  unit  at  the  end  of  the  learn- 
ing period  due  to  all  stimulus  other  than  St. 

Before  evaluating  CT&,    several  quantities  will  first  be 
calculated. 

The  variance  of  Nftp  is 


^-2(Nar)  *  E(Nar2)  - 


E(Na„) 


2  (21) 


The  expected  value  of  Na_  has  been  found  to  equal  NAr  Pa* 

p 

The  expected  value  of  Nar  can  be  found  as  follows: 

E(Nar2)  -ZNa^Pi  =  Z"NAr2PAi2Pi  »  NAr2  ^  PAl2Pi 

=  NAr2  pa  (see  PftSe  22  for  E(PAl2))       (22) 
Substituting  the  above  values  in  (1),  the  variance  of  Na 
becomes: 

(7-2(Nar)  =  NAr2Pa  -  (NArPa)2 

*  NAp(l  -  Pa)Pa  (23) 

Each  A-unit  will  be  exposed  nSo  times  with  a  probability 
of  being  activated  upon  each  exposure  of  Pa.   The  value  gained 
by  the  &*   unit  upon  the  i^  exposure  is  PAlA.V  =  Pa±*    assuming 
the  increment  of  value  gained  upon  activation  is  unity. 

The  total  value  gained  by  the  aj  unit  upon  nSa  exposure 
can  be  represented  by 

v(aj)  »Z_  PAi  (24) 

ial 


27 


The  variance  of  v(a^)  will  be 


a-1 


v(aJ>] 


cr' 


(naa)cr2(pAl) 


i=l 

•   (nSa)(l  -   Pa)Pa  (25) 

In  order   to  evaluate  the  variance   of  the  net  value  of  a 
source-set  CT2(vr),   which  is  a  function  of  two  random  variables, 
consider  the  following  derivation  of  the   analogous  expression. 

The  expression  for  d  is  a  function  of  two  random  variables 
Na_  and  v(aj),    so  that  additional  considerations  must  be  made  as 
to  the  calculation  of  the  variance  of  a  quantity  which  is  a  func- 
tion of  two  random  variables. 

Nftr 
In  order  to  calculate  the  variance  of  {__     v(a^)    of  which 

1-1    J 

Na  and  v(aj)  are  random  variables,  the  following  derivation  for 

the  variance  of  summations  of  random  variables  is  necessary. 
Consider  the  summation 


i=l 


(26) 


where  x^  and  n  are  random  variables. 

The  expected  value  of  S  and  S2  are  found  as  follows: 


ES 


•'%'■{< 


EUj)E(n) 


(27) 


=  EnlnEUj) 

where  in  general  En  f(n)  is  the  expected  value  of  f(n)  taken 
over  n. 


ES2  m   E 


i=l 


xi 


s  E 


n 


i=l         l=rj   X  J 


=  En[nE(xi2)  +  n(n  -  1)(E(xj))2J 
ES2  =  EtXi2)  E(n)  +  E(n2)(E(Xj))2 
-  E(n)(E(xj))2 


(28) 
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The  variance  is  defined  as  the  second  moment  minus  the  square 
of  the  first  moment,  or 

CT2S  m   E(S2)  -  (E  (S))2 

=  E(n)  [e(Xi2)  -  (E(xj))2J 
+  (E(Xj))2  [E(n2)  -  (E(n))2] 

Q*lZ.    «i>  *E(n)cr2(Xl)  +  (E(Xl))2<^(n) 
i*l  J        J 

Substituting  in  equation  Na  =  n  and  v(a^)  •  Xj, 

then   cT2(vr)  =  E{Nar)(T2(v(aj))  +  (Ev(a-j))2  cT2(Nar)    (30) 

•  (PaNAr)Vl  -  pa)nsa  +  (Pa^Sa*2  P»(l  -  ?*)*Ar 
=  Pft2(l  -  Pa)  NAr  nSft  1  ♦  PanSa 

^Pa3(l  -  Pa)NAr(nSa)2  (31) 

It  should  be  noted  that  those  A-units  which  are  In  common 
to  the  two  source-sets  contribute  equal  value  to  both  sets. 
Hence  they  do  not  affect  the  net  difference  in  bias  between  the 
sets.   Since  only  the  number  of  effective  units  are  under  com- 
parison, Na,  may  be  replaced  by  Ke  In  the  above  expression. 

Prom  statistical  theory  it  Is  known  that  the  variance  of  a 
difference  of  non-correlated  random  variables  is  equal  to  the 
sum  of  the  variance  of  each  quantity,  or 

tf~2(Vi  -  v2)  ■  cr2(v1)  +  <r2(v2)  (32) 

Now  in  this  particular   analysis  the  variance   of  each  source-set 
is   the   same.      Thus 

cr2(dz:)  m  (T2(V1)  ♦  o-2(V2)  (33) 

=  2<r(Vr) 
^2  Pa3(l  -   Pa)  Ne(nSa)2  (34) 
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The  final  quantity  required  in  order  to  calculate  Pr,    the 
probability  of  correct  response,    is  P(Z\b  /  -b),    the  probability 
that  at   least  one  A-unit  will  respond  to  the  stimulus  in  ques- 
tion. 

P(Ab  +  -b)   implies   that  Nar  -  Nac>0;    thus  P(  Ab  ^  -b) 
•  P(Nar  "  Nac>0). 

The  probability  that  any  particular  A-unit  on  the  average 
will  not  respond  to  a  given  stimulus  is   (1  -  Pft) .      In  terms   of 
the  number  of  effective  units  Ne,    the  probability  that  no  A-unit 
will  respond  to  a  given  stimulus  is 

P(Nar  -  Nac  =  0)  =   P(Ab  =  -b)   =    (1  -   Pa)N«  (35) 

It  follows  that  the  complement  of  this  probability  is  the  prob- 
ability that   at   least   one  unit  will  respond  to  a   stimulus 

?(A\>  £  -b)  =  1  -    (1  -   Pa)Ne  (36) 

Prom  the  above  consideration  the  probability  Pr  is  given 
by 

P„  *  ~~   ll  -  (1  -  PjNe]  J   /--  dt  (37) 


2  Pa(naJ2 


(38) 


By  use  of  the  normal  cumulative  distribution  tables,  the  above 
expression  may  be  evaluated.   Plates  II,  III,  and  IV  show  the 
results  of  such  evaluations. 

For  the  previous  development  for  the  equation  of  Pr  and  for 
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those  to  follow,   it  should  be  noted  that  Pr  represents  the  prob- 
ability of  correct  responses  for  the  stimulus  S^.      However,   no 
special  constraints  were  placed  on  St   as  compared  to  other 
stimulus  except  the  designation  of  S^  to  a  particular  class   of 
stimulus.      Then  S^  is  a  generic  stimulus  of  its   assigned  class 
and  all  equations  concerning  S-fc  hold  equally  well  for  all  members 
of  the  class   of   stimuli  to  which  S^.  belongs. 

For  the  various  graphs  to  follow,   it  will  be  useful  to  in- 
troduce the  following  relationships. 

NRft 
Let  w     m  =  proportion  of  R-units   connected  to  an  A-unit 

NAC 
o>c  s =  proportion  of  A-units   connected  in  common 

NAr 

to  the  R^   and  Rj  response  units 

■  P  /unit  a^  belonging  to  the  R^   source-set 

is  common  to  the  Ri   source-set] 

measure  {RjJ 

measure  /r'I 

where  R1  m  the   set  of  all  R-units  except  Ri 

Prom  the  definition  of  Nr     it  follows: 


Me   {RjJ    -  HRji  -   1 

and,    of  course  Me   (r1/    =  Nr  -   1 

Nr     -   1       a)  NR   -   1 

Thus  o>     m » (39) 

NR   -    1  NR   -    1 

If  Nr  is   large,    o>c   approaches  o>;    and  when  coc  =  0,    o>  =  --. 

Nr 
The  curves   of  Plate  II    show  Pr   as   a  function   of  Pa  for 

several  values   of  NAp  with  1000  stimuli   associated  to   each 
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response  unit  and  non- over lapping  subsets.  For  a  small  number 
of  A-units  per  source-set,  Pa  has  a  critically  optimum  point. 
Increasing  the  number  of  A-units  increases  the  probability  of 
correct  response  for  a  larger  range  of  Pa.   For  NAr  =  106,  Pr  is 
nearly  unity  for  a  range  from  Pa  =  0  to  Pa  =  .05. 

With  almost  certainty  that  an  A-unit  will  respond,  that  is, 
Pa  *  1  for  a  given  stimulus,  then  it  is. evident  that  Pr  assumes 
chance  expectancy  (Pp  =  .5). 

Plate  III  shows  a  set  of  curves  for  Pr  as  a  function  of 
n3r>  the  number  of  stimuli  associated  to  each  response.   Param- 
eters of  the  system  consist  of  non-overlapping  subsets  and  a 
fixed  Pa  =  .005  which  is  rather  an  optimum  value  of  Pft.  From 
the  curves  it  can  be  concluded  that  the  number  of  stimuli  which 
can  be  associated  to  a  response  unit  for  correct  recognition  in- 
creases with  the  number  of  A-units  per  subset. 

Plate  IV  gives  the  same  sets  of  curves  with  the  system 
parameters  adjusted  for  co  =  uc  ■  .5,  that  is,  the  expected  over- 
lap among  source-sets  is  50  per  cent. 

Since  these  curves  are  for  ideal  environment,  each  stimulus 
of  each  class  is  independent  or  uncorrelated  with  any  other  stim- 
ulus.  However,  for  any  attempt  to  simulate  this  in  a  realistic 
environment,  there  would  inevitably  be  a  relationship  between 
stimuli  of  a  class.   This  would  lead  to  mutual  support  between 
stimuli  of  a  given  category.   Thus  an  increasing  number  of  A- 
units  would  tend  to  be  activated  in  common  for  stimuli  of  the 
same  class,  which  would  in  turn  increase  the  bias  in  the  desired 
direction,  making  Pr  higher  under  a  given  set  of  parameters. 
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In  the  following  section  the  probability  of  correct  re- 
sponse Pr  will  be  calculated  for  the  mean  discriminating  Alpha 
system.   With  mean  discrimination  the  Perceptron  responds  to 
mean  values  of  the  active  subsets  of  A-units  rather  than  to  sum 
values.  In  this  system  the  component  of  variation  of  the  con- 
trolled bias  Ab  is  zero,  since  It  Is  due  only  to  the  variation 
in  the  number  of  A-units  activated  by  the  test  stimulus.   The 
mean  value  is  measured  over  the  entire  number  of  A-units  which 
are  activated  by  S^.  per  source-set.   Hence  the  expected  bias  B 
is  the  same  as  the  bias  in  the  sum  system  divided  by  the  number 
of  active  effective  units  per  source-set  which  can  be  repre- 
sented by 


5/^  -  *M   +  *M   * 


PaNe  -  P?%e 
P.  »e 


a 


(40) 


As   before,    the  variance   of  the   value   of   an  A-unit   after  n8a 
exposures  is 

CT2  [v(aj)J    =  Pa(l  -   Pa)  naa  (41) 

For  Nar  active  A-units  per  set,    the  variance   of  the  mean  value 
of  the  A-unit  of  one   source- set  is  given  by 


CT 


T(.j) 


CT 


[v(aj)] 


(42) 


NaT 


Assuming  disjunct  sets  (non- overlapping  source-sets),  the 


variance  of  the  difference  of  the  two  means  is 

.2 


cr2(d^)  «  o*    vxUj) 


v2(aj) 


(43) 


+  cr 

and  assuming  the  variance  of  both  source-sets  to  be  equal,  the 
standard  deviation  of  d,  the  positive  square  root  of  the  vari- 
ance,   is 
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Vf 


[v(aj] 


-2     v( 
<T^jj)   =    //— -- L—  -*-  (44) 


Na 
ar 


or   substituting  f or  cr    (vr),   Na  ,    and  simplifying 

j/2(l  -  Pa)  nSa 
CTUy,)   *7 (45) 

M         '  NAr 

Allowing  for  the  correction  of  overlapping  sets,  NA  may  be 
replaced  by  Ne  (the  effective  number  of  A-units  which  contribute 
to  the  net  bias  between  sets),  and  the  above  equation  becomes 

j/2   nSa(l  -  Pa)~ 

Correcting  as  before  for  the  probability  that  no  A-unit  will 
respond,  an  analogous  expression  for  Pr  for  the  mean  system  can 
now  be  written. 

where         Z  =  I (47) 

'   2  ns. 

Pr(zv)  as  a  i'unc'ti011  °?   pa  ls  illustrated  by  the  set  of 
curves  in  Plate  V.   The  broken  curve  is  given  for  Pr(^")  for 
Na_  ■  10,000  for  comparison  of  the  sum  and  the  mean  value  sys- 
tems. It  is  quite  evident  that  under  comparable  conditions,  the 
mean  value  system  allows  a  much  wider  range  of  Pa  for  relatively 
good  accuracy  of  correct  recognition. 

Plate  VI  shows  a  definite  advantage  for  the  ^-f- system  as 
compared  to  Plate  III  for  the  ^.-system.   For  Instance,  with 
Na  ■  10,000  A-units  per  source-set,  the  value  of  Pr(/y)  remains 
nearly  unity  for  about  500  associations  per  response  and  slopes 
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off  gradually.   In  addition,  the  graph  shows  that  for  a  very 
large  nSr,  Pr  reaches  the  chance  expectancy  of  0.5. 


Effect  on  the  Alpha  Systems 
with  Variation  in  nSr 


In  all  of  the  previous  analysis,  n3l>,  the  number  of  Inde- 
pendent stimulus  associated  with  each  response  unit,  has  been 
assumed  to  be  equal  for  all  response  units.   In  reality,  na  may 
be  considered  as  a  random  variable.  In  the  ideal  or  random  en- 
vironment, it  will  be  shown  that  the  Alpha  system  will  be  incap- 
able of  efficient  operation  under  the  condition  of  random  na  . 

First  consider  the  case  of  non-overlapping  subsets  for  the 
sum  system  of  the  Alpha  Perceptron.   Upon  allowing  nSr  to  vary, 
It  will  become  quite  evident  that  correct  response  is  almost  im- 
possible.  Consider  the  circumstances  under  which  Ri  is  associated 
with  ns..  stimuli  and  R2  is  associated  with  na2  stimuli.   Pa  may 
represent  on  the  average  the  value  gained  per  A-unit  per  stimu- 
lus. It  follows  that  the  value  of  the  R2  source  set  at  the  end 
of  the  learning  period  is  Pa  N&r  nar>,  the  value  of  R^  set  before 
the  presentation  of  the  last  stimulus  is  (na.,  -  1)  Pa  Na  ,  and 
the  value  gained  by  the  1st  stimulus  is  Na  .  The  expected  net 
bias  at  the  end  of  the  learning  period  is 

d  m   vx  -  v2  =  (nSl  -  l)PaNar  +  Nar  -  nS2PaNar     (48) 

As  can  readily  be  seen,  if  n3^  is  less  than  nS2,  then  the 
expected  bias  might  easily  be  negative.   Thus  the  probability 
of  correct  response  for  any  stimulus  of  the  ns-  class  which  is 
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presented  in  the  teat  period  is  very  low.   In  this  case  for 
correct  response  the  value  of  the  stimulus  must  not  only  be  posi- 
tive but  it  must  be  of  sufficient  magnitude  to  overcome  the  nega- 
tive net  bias  of  the  system.  If  nSg  is  much  greater  than  nSl, 
it  is  impossible  to  obtain  correct  response. 

Now  consider  the  condition  in  which  the  A-subsets  are  over- 
lapping and  the  number  of  response  units  connected  to  each  A-unit 
is  large.  ns_,  nS2,  . ..  will  be  picked  from  some  distribution 
of  nSr*  not  necessarily  a  normal  distribution.   The  total  number 
of  stimuli  presented  to  the  system  Is  not  controlled;  however, 
its  expected  value  is  NRna  .   In  this  hypothetical  experiment 
the  variance  of  n3  will  be  considered  to  be  large. 

Under  the  above  conditions  a  quantitative  analysis  of  the 
mean  discrimination  Alpha  Perceptron  will  be  made.   Before  pro- 
ceeding with  the  anlysis,  the  following  relationships  will  be 
necessary. 

If  x  is  a  random  variable  and  f  is  any  randomly  varying 
function  whose  distribution  depends  on  the  value  of  x,  then  the 
expected  value  of  f(x)  is  equal  to  the  mean  value  taken  over  all 
x  of  the  conditional  mean  value  of  the  function  f(x)  relative 
to  the  hypothesis  x  =  5,  or  in  notational  form 

E[f  (x)]  *  Ex  E  [f (x/x  *  §  )]  (49) 

Similarly,    the   expected  value   of  the   square   of  the   same 
function  is 

E[(f2(x)J   =  Ex  E  [f2(x/x  =    %  )]  (50) 

The  variance  of  such  a  random  varying  function  in  terms  of 
the  above  equalities  and  inserting  the  second  and  third  terms 
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which  sum  to  zero  can  be  represented  by 


(X^   f(x)    =   ET  E 


) 


E, 


E(f (x/x  -    5 ) 
2 


) 


(51) 


f2(x/x  *    . 
+  Ex[E(f(x/x  =    |)J2  -   %[e  f(x/x  = 

cr2  f(x)  -  ex  cr^fU/x  =  |)J   +  crz  e  f(x/x  -  §  )]    (52) 
Continuing  with  the  experiment,    let  the  test  stimulus  St 
activate  n  effective  cells   (non-overlapping  units)    in  the  R^ 
source-set  represented  by  ai,    ag,    •  ••,    an.      Furthermore,    let 
v(aj)   ■   the   value   of  tiie   aj  unit   at   the   end  of  the    learning 
series,    except  for   the    effect   of  S^.      The  variance   of  the  time 
conditional  mean  value  of  one   source-set   is  represented  by 

n 


v(aj) 


) 


2     Vr  2      J=1 

n  n 

which  is  of  the  form  of  equation  (52). 

Substituting  in  equation  (52), 


(53) 


a-2  ( 


n 

Ivd.) 


)   =  EncT"2    ( 


kv^ 


) 


)  +  <r' 


E( 


n 


n 


The  expected  value  of  v(a-j)  ■  Pa  E  Npa  E^   , 

Z_  v(&1) 
1     J 


(54) 


Consequently,  E( 


)  is  independent  of  n.   Therefore  the 


n 


second  term  of  the  above  equation  is  zero,  and  it  reduces 

n 


--)  *  En  C7"2  ( 


2-    v(ai) 

) 


(55) 


n  n 

In  order  to  express  the  total  value  of  a  source-set,  let  n, 
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the  number  of  non-common  A-units  reacting  to  St,  be  a  fixed 
value.   Then 


n         n   nsr 

zfv(ai)  *  IE     X  xr  (a«,k) 
j=l       j=l  k=l      J 


n 


+  21  Z_   £-  X*  Uj.k) 
j=l  r=3  k=l 


(56) 


where  xr(ajk)  = 


\     if  the  Ktn  stimulus  associated  with  response 
r  activates  a< 


0  under  all  other  conditions. 

The  total  value  of  R^  source- set  is  the  value  gained  in 
the  set  of  units  non-common  to  R2  over  ng.,  stimulations  plus  the 
value  gained  by  those  A-units  which  are  common  to  other  sets  and 
which  gain  value  due  to  other  stimuli  associated  with  other  sets. 
This  second  term  is  a  summation  taken  over  all  possible  response 
units,  r  =  3  to  Nr,  and  all  nSr>  the  number  of  stimuli  associated 
to  response  r. 

For  the  sake  of  clarity  for  further  calculation,  assume 
that  out  of  n  cells  a^  . . .  a^  nip  are  in  the  r  source-set. 
Then,  of  course,  m^  =  n  and  m2  »  0. 

The  general  term  of  the  summation  is  independent  for  dif- 
ferent values  of  r. 

The  variance  of  each  term  for  different  values  of  r  takes 
on  the  form  of  equation  (21). 

For  r  =  3  ... 

n8, 


cr 


3^r  mp 

1_       Z.  x^a^k) 

k=l  j*l     J 


<r* 


Zl  mi.xr(aj,k) 


k*l 


cr   rmrxr(aj,k)  E  nSr  +  |E(mrxr(aj,k) 


2  x  a"2  (nB_  ) 


(57) 
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2        2 

■  mpPa(l  -  Pa)   E  nar  +   (mpPa)     (T      (nSr) 

For  r  *  2,    the  corresponding  quantity  is   zero. 
With  r  =  1,   the  variance  is 


cr 


Z*     Z_    xr(aj,k) 


J*. 

z_ 

k=l     j»l 


=  n  Pa(l  -   Pa)   E  nap 
+    (n  Pa)2  CT2    (naJ 


(58) 


(59) 


n 


The  total  variance   of    \_     V(aj.)   is 

1 


<T 


^ 
fe 


v(ai) 


Ni 


=  Pa(l  -   Pa)   E  nSr     Y~     ^r 

r=l 


(Pftn)2  cr2   (n3r) 

r=*l 


(60) 


Nr 


The   summation  £_     mj,  represents  the  total  number  of  R  con- 
r=l 

nections   originating  from  n  cells,    and  is  equal  to  nNRft.      In 

9    2 

order  to  compute  £_  mj.  ,  the  variance  in  the  intersections  of 

r=l 

different  source -sets  is  neglected.   Other  than  m^  and  mg, 

NRa  -  1 
nip  =  uc'n,  where  coc'  ■ .   wc'  is  found  in  the  same  man- 

NR  -  2 

ner  as  was  <oc  on  page  30,  except  that  Me  JR'j  «  Nr  -  2  for  wc'. 

Then 

gB  nR  r  "] 

mr2  *  n2  +  J  "c'2n2  *  n2  1  +  (Nr  -  2)coc»2 
r=l  r«=2  L  J 


=  n' 


(NRa  -  I)*2 
1  +  _--? 


Nr  -  2 
The  variance  required  for  equation  (53)  is  then 
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n 


Z    v(ftl) 

,       1                        Pad  -   Pa) 
^2    ( )   a E  nSr  NRft 

n  n 


+  ?&2<T2   (ns    ) 


(HRft  -  1)' 


+  1 


(62) 


NR   "   2 

Now  the  expected  value  of  variance  taken  with  respect  to  n, 
yields  for  the  variance  of  the  mean  value  in  the  R^  source-set: 


Iv(ai) 

cr2   ( )   =r  Eq 

n 


(1  -  P«) 


n 

r  v(ai) 

<r2{ ) 

n 


2^-2, 


N, 


E  nSr  NRft  +  Pa^^^lngy) 


(NRfl  -   IV 


NR  -   2 


+   1        (63) 


Since  the  above  computation  was  general,    the  variance  of  R2  set 
is   the   same. 

Thus  the  total  variance   of  the  net  random  bias  under  the 
conditions   of  random  nSr  for  the  ^Y~diacriminatlon  o:f  the  A1Pna 
Perceptron  is  given  by  twice  the  variance  of  the  above,    so  that 


^o  2(1  -   Pa) 

<=T*   (d^)   = •    E  nar  NRa 


+  2  Pa2(T2(na_) 


(NRfl  -   1)' 


+   1 


(64) 


(NR   -   2) 

Again  assuming  a  normal  distribution  for  Pr( /y),  the  prob- 
ability of  correct  response  with  random  nSr  for  the  Alpha  Per- 
ceptron is  given  by  the  expression: 


-Uf) 


1  -  (1  -  p«) 


N, 


*    2. 


f^T/. 


£-t  V2 


dt 


o& 
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1  -  Pa 
where  Z  =  --,- (66) 

which  reduces  to  equation  (47)  when  cr~(nSr)  =  0. 

Plate  VII  illustrates  quite  clearly  the  effects  of  permit- 
ting a  large  variation  in  n3  . 

Several  curves  are  plotted  with  Pr  as  a  function  of  nSr  for 
a  system  with  100  response  units,  and  10,000  A-units. 

The  broken  curve  represents  the  same  system  with  no  varia- 
tion in  nSr.  A  quite  definite  decrease  in  accuracy  of  perform- 
ance of  the  system  is  indicated  by  allowing  nSr  to  vary.   The 
best  operation  results  with  disjunct  sets.  It  should  be  kept  in 
mind  that  the  ideal  environment  condition  is  imposed  in  which 
each  stimulus  of  each  class  Is  entirely  independent  from  any 
other  stimuli.  - 

A  similar  situation  exists  when  the  size  of  the  stimuli  is 
allowed  to  vary.  A  qualitative  examination  will  be  sufficient 
to  demonstrate  this  point. 

Consider  the  case  where  the  stimuli  of  class  Rp  were  much 
larger  than  those  stimuli  associated  to  the  R-^  source-set.   Pa, 
the  expected  probability  that  any  A-unit  will  be  activated,  will 
be  greater  for  stimuli  of  class  R2,  and  the  mean  value  of  the 
R2  set  will  grow  faster  than  the  mean  value  of  the  R-j^  set.   Then 
if  the  test  stimulus  S^  which  has  a  disadvantage  in  measure  Is 
shown  to  the  system,  its  reinforcement  will  probably  not  be  suf- 
ficient to  overcome  the  already  favored  bias  toward  R2  set. 
Hence  incorrect  response  will  result. 

From  this  example  one  can  see  that  the  effect  of  variation 
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From  this  example  one  can  see  that  the  effect  of  variation 
in  size  of  stimuli  between  the  source-sets  is  to  decrease  the 
accuracy  of  the  system. 

The  decrease  in  performance  of  a  system  due  to  stimuli  size 
variation  is  less  than  the  effect  due  to  nSr  variation,  since  Pa 
can  be  held  reasonably  constant  over  a  wide  range  of  retinal 
size  variation. 

THE  GAMMA  PERCEPTRONS  FOR  IDEAL  ENVIRONMENT 

Sum  Discriminating  Gamma  System 

The  same  logical  analysis  will  be  made  for  the  Gamma  Per- 
ceptron  as  was  made  for  the  Alpha  system.  Analysis  will  be 
carried  out  for  both  methods  of  discrimination. 

The  Gamma  system  will  hold  all  sets  at  equal  levels,  and 
also  it  has  the  advantage  of  maintaining  the  mean  value  of  the 
entire  system  constant.  In  terms  of  electronic  simulation  of 
this  system,  the  above  advantage  would  prevent  the  saturation 
of  integrators  and  counters  as  would  be  found  in  the  Alpha  Per- 
ceptron.  In  a  physiological  system,  this  could  mean  that  the 
cells  are  required  to  maintain  an  optimal  range  of  sensitivity. 

"The  Gamma  system  can  be  thought  of,  physiologically,  as 
involving  a  constant  chemical  or  nutrient  distribution  rate, 
which  is  normally  just  sufficient  to  balance  the  expected  rate 
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of  utilization." 

Now  we  will  proceed  with  the  analysis  of  the  Gamma  system. 

For  the  sum  discrimination,  the  expected  controlled  bias  b 
is  the  same  as  in  the  Alpha  system,  since  b  is  the  value  gained 
by  the  R^  source  set  due  to  the  presentation  of  S^.   Therefore 
b"  m   Pa  Ne.  na.  and  nSp  are  the  number  of  stimuli  associated 
with  R^  and  R2  sets,  respectively.   If  the  test  stimulus  is  de- 
leted from  the  learning  series,  then  na^  -  1  stimuli  are  asso- 
ciated to  R^  and  ns_  stimuli  are  associated  to  Rg. 

The  value  gained  by  the  A-units  active  for  one  unit  of  time 
is  (nsl  -  1)  Pa^e»  but  the  value  lost  from  Inactive  A-units  of 
the  dominant  set  is 


(n31  -  l)PaNe  (1  -  Pa)  E( 


N, 


Therefore 

Vx  =  (nai  -  l)PaNe 


NAr  -  Nar 
Nar 


) 


Pa  -  (1  "  Pa)  E( ----) 

NAr  "  Nar 


Similarly,  for  Rg  source- set 


(66) 


f2   =  nS2  PaNe 


I 


ar 


P.  -  (1  -  P.)  E( 5—) 


a 


(67) 


NAr  -  Nar 

Then  the  expected  net  bias  due  to  all  stimuli  associated 
with  R^  and  Rg  except  S^.  it 


d"  ■  Vn  -  Vp  =  (na,  -  1  -  nSo) 


si 


1 


ar 


Pa  -  (1  -  Pa)  E(— ----- -) 

NAr  -  Nar 


(68) 


■^Rosenblatt,  Frank.   "The  Perceptron — A  thBory  of  statis- 
tical separability  in  cognitive  systems."  Cornell  Aeronautical 
Laboratory,  Inc.   Report  No.  VG-1196-G-1,  January,  1958. 
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E( )  will  now  be  evaluated. 


I 


ft. 


Nar  P 


r  ^a 


Nar  Pa 
g/ „.« )  —  g/ ~ )  —  g( i __) 

J  NAr  -  Nar  '  NArPa  -  NarPa  Nar  -  NarPa 


1  -  Pa 


(69) 


Substituting  this  quantity  into  the  expression  for  d  yields 

pa 
d  a  (ns,  -  1  -  nS2)  Pa  -  (1  -  Pa)  ■  0    (70) 

L  1  -  pa 

The  expected  total  net  bias  is  then 

B  -  b  +  d"  =  Pa  Ne  (71) 

Now  let  a  stimulus  activate  m^  units  (a^  ...  ami)  in  the 
Rl  source-set  (exclusive  of  common  units),  and  mg  units 
(bi  ...  bm2)  *n  the  R2  source-set. 

For  fixed  nSr  stimuli  per  source-set,  then  the  Ri  source- 
set  component  di  given  mi  is 


NR   nsj.  mi 

dl/mi  *  Z   Z.   Z.  xr(aj.,h) 
r«l  n=l  i=l 


(72) 


where 


1  «  when  &±   of  the  r  source-set  is  activated  by 
the  h  th  stimulus  associated  to  response  r 


xr(aj.,  h) 


Nar 
=  -E( ) 

NAr  -  Nar 


Pa 
1  -  Pa 


when  ft^  of  the  r  source- set  but  is  not  acti- 
vated by  the  h  th  stimulus 


0  *  for  all  other  conditions 
For  fixed  n3r,  the  variance  of  di  given  mi  is 
Nr 


o  Nr  o/21 

CT*   (di/t)  «  Z     E  nSr  cT-2  Z  xr(ai, 


r=l 


h)y 


,i=l 
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mi 
a  E  n81  a-2,   (21    x'Uj^h)) 

*      1=1 

51 
+  E  n82  cr2  CZ-    x2(ai,h)) 
1*1 

♦  2-  cr2   (II  a^(*i,h)) 
r=3      1*1 


(73) 


By  definition  of  m^,  the  second  term  in  the  above  equation  is 
zero.  In  this  formulation  the  variation  measured  by  the  vari- 
ance of  one  source-set  due  to  all  stimulus  associated  to  all 
response  units  is  to  be  calculated. 

In  the  Gamma  system  the  h  th  stimulus  associated  with  Ri 
activates  a  certain  number  of  non-common  units,  m^,  and  the  in- 
crement gained  by  the  active  units  is  1  and  the  value  lost  by  an 
inactive  unit  in  the  H^   set  is  Pg/l  -  Pa.   Assuming  the  average 
taken  over  all  stimuli,  the  net  value  of  a  unit  per  stimulus  Is 

Pa 
Pa (1  -  Pa)  ■  0 

(1  -  Pa) 
Previously  it  was  shown  that  the  expected  value  of  a 
source-set  was  zero.   The  variance  Is  equal  to  the  second  moment 
in  this  case,  and  the  calculation  of  the  variance  proceeds  as 

follows. 

mi 


cr 


(H  xr(ai,  h))  -mi  cr  -  xr(ai,  h) 
1=1  L 

■  miE  [Vtai,  h) 


*  m1 

*  m1  ( 


'a  -  (1  -  Pa)(— "-)' 

1  -  pa 
P.  * 


1  -  P. 


) 


(74) 
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Consequently  the  variance  of  d  given  ns   is 

2    /       /                                Pa 
CT      (di/m-i)  ■  E  na.    m-, 

1   (1  -   Pa) 


?a 


+   (NR.   -   1)   E  nSp m-, 

1  -   pa 

pa 
■  E  ns     mi NR  (75) 

1  -  ?a 
The  variance   of  dg/mg  is   given  by  the  previous   calculation  with 

m^  replaced  by  nig.      Then   the  total  variance   of  both  F^  and  R£ 

source-set  is 

CT2   (d/mlf   mg)  =  cr2   (d-jA^)   +  (T2   (dg/nig) 

pa 
=  E  nsr   ~ "  NRa    <ml  ♦  ng)  ("76) 

On  the  average,  then,  the  number  of  units  in  a  source-set 
activated  by  any  stimulus  is  equal  to  PaNe.  The  variance  of  d 
in  the   sum  system  is 

cr2    (d^)   =  2  E  nSr   Pft(l  -  Pft)"l  NRft   (P^#)  (77) 

Prom  the  above  equation  it  may  be  noted  that  the  variance  of 
nSr  does  not  enter  into  the  final  variance  of  d  in  any  way. 
This  indicates  that  in  the  Gamma  Perceptron  the  restriction  that 
nSr  must  be  uniform  is  removed. 

The  expression  for  the  probability  of  correct  response  in 
terms   of   the   system  parameters   for   the  Gamma  system  with  sum 


discrimination  and  varying  na     is 


pr(2) 


1  -   (1  -  Pa)Ne 


ftotl^ 
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where  Z  =  -- «  / (78) 

J2   Enp  Pa*  (1  -  Pa)"l  NRaNe   /  2  NRa  E  ns 


T 


Mean  Discriminating  Gamma  System 

A  quite  similar  development  follows  for  the  mean  discrimi- 
nation system.   The  expected  bias  is  the  same  as  in  the  previous 
case  divided  by  the  average  number  of  active  units  PaNe.   There- 
fore b  =  1  and  d  ■  0;  hence  15  =  1. 

As  before,   Efd-j/m^)  ■  0,  and  E(d2/m2)  =  ° 

The  variance  of  the  mean  value  of  d-^/m^  is 

CT2  (di/mx)         Pa(l  -  Pa)"1  Nr 

.  =  E  nSr -5  (79) 

my?  m^ 

The  total  variance  of  d  may  be  expressed  as 

a2  (dj/mi)        CT2   (dg/ng) 

^  V>  m  --m-i-' +  —;-r™  (30) 

r    (1  -   Pa)    PaNe 

=  2  E  nST> * (81) 

r  H#(l  -   Pa) 

The  expression  for  Z  may  be  written 

Z  =   f *—  (82) 

/    2  E  nSr  Npft 

which  is   the  same  as  Z  for   ?r/^-\. 

Thus  under  ideal  environment  conditions   for   the  Gamma 

system 

Prf,/>    =   PW^rM  (83) 


'(^)   =  rr(^) 


59 


Comparison  of  performance  for  the  Alpha  and  Gamma  systems 
is  illustrated  by  Plate  VIII  which  shows  *r(/Y)  versus  E  nna  • 
The  graph  is  for  ideal  environment  conditions  and  for  the  assump- 
tion that  the  variance  of  ns_  is  equal  to  half  of  its  expected 
value.   The  Gamma  system  has  a  definite  advantage  under  these 
conditions. 


ANALYSIS  OP  THE  ALPHA  SYSTEMS  FOR 
DIFFERENTIATED  ENVIRONMENT 


Alpha  Perceptron  for  Sum  Discrimination 

All  analysis  up  to  this  point  to  determine  the  performance 
of  the  various  Perceptron  systems  has  been  for  an  experiment 
under  the  assumption  of  ideal  environment.   It  has  been  assumed 
that  each  stimulus  belonging  to  a  particular  class  was  chosen  to 
be  a  random  collection  of  points  on  the  retinal  area  of  the  sen- 
sory cells.  With  this  random  environment  there  was  no  correla- 
tion among  any  stimuli  within  any  class.   Likewise,  the  stimuli 
for  the  different  classes  were  chosen  at  random.   Then  the  cor- 
relation of  stimuli  between  classes  is  also  zero.   The  only  re- 
striction was  that  the  measure  of  the  stimuli  be  uniform. 

It  has  been  shown  that  the  performance  of  all  Perceptron 
systems  decay  to  a  chance  expectance  for  correct  response  under 
random  environment  conditions.   This  result,  of  course,  could 
have  been  predicted. 

However,  the  previous  analysis  was  for  the  purpose  of  com- 
parison of  the  possible  Perceptron  systems  and  to  serve  as  an 
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analytical  model. 

Now  it  is  of  importance  to  determine  the  reaction  of  the 
Perception  systems  to  classes  of  stimuli  with  some  kind  of  re- 
lationship and  correlation  among  the  stimuli  of  a  class. 

In  the  remaining  analysis  of  this  report  the  performance  of 
the  Perceptron  systems  will  be  evaluated  under  an  experiment  In 
which  non-random  or  differential  environment  conditions  exist. 
Differential  environment  means  that  the  stimuli  of  any  particu- 
lar class  have  some  correlation  in  their  characteristics.   For 
example,  one  class  might  be  circles  with  different  locations 
within  a  defined  retinal  region,  and  the  other  class  might  be  a 
set  of  squares  with  various  locations  within  the  same  specified 
region.   Under  these  conditions  it  will  be  shown  that  the  recog- 
nition performance  of  the  Perceptron  can  be  made  to  approach  an 
asymptotic  level  different  from  chance  expectancy  with  increas- 
ing number  of  stimuli. 

Before  proceeding  to  the  analysis  with  differentiated  en- 
vironment, several  new  symbols  and  concepts  need  discussion. 

In  the  ideal  environment  case  it  was  assumed  that  since 
there  was  no  correlation  between  stimuli,  the  expected  portion 
of  overlap  of  A-units  between  St  and  stimuli  of  class  1  or  2  was 
equal  to  Pa.   In  the  present  case  there  exists  a  relationship 
between  stimuli  of  the  same  class  which  will  be  measured  by  var- 
ious forms  of  Pc.   In  general,  Pc  may  be  defined  as  the  condi- 
tional probability  that  an  A-unit  activated  by  one  stimulus  S]_ 
will  also  be  activated  by  another  stimulus  S2. 

Let  PCxy  represent  the  expected  value  of  Pc  for  two  stimuli 
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from  classes  x  and  y.   Pet*  *s  ^e  expected  value  of  Pc  between 
stimulus  St  and  all  stimuli  belonging  to  classes  other  than  1 
and  2.   The  probability  Pc  represents  a  mean  or  expected  value 
as  did  Pa.   Pc  for  finite  number  of  stimuli  exposures  will  be 

represented  by  Rcti(s)  and  pct2(s)*   Pctl  is  a  mea3ure  °*"  rela- 
tionship between  the  test  stimulus  S^  and  another  stimulus  of 
class  1.   If  Pctl/^j  denotes  the  expected  value  of  Pctl  for  ^he 
Jth  unit,  there  is  a  resulting  distribution  of  Pctl(l)  over  the 
A-units. 

Stimulus  St  associated  with  R-^  will  activate  PaNe  A-units. 
These  units  may  be  thought  of  as  a  particular  subset  of  Rj 
source-set.   Suppose  a  stimulus  S^  associated  with  R^  is  shown 
to  the  Perceptron.   Then  Pcn  is  the  expected  proportion  of  these 
units  in  the  S^R^  subset  that  will  be  activated.   These  units 
will  gain  an  increment  of  value  (by  convention  Av  =  D  • 

^cll  may  also  be  interpreted  as  the  expected  value  on  the 
average  that  an  A-unit  of  this  S^Rq^  subset  will  gain  upon  an  ex- 
posure by  S-,.   Pcn  represents  the  expected  probability  that  an 
A-unit  which  is  activated  by  a  particular  stimulus  of  class  R^ 
will  also  be  activated  by  any  other  stimulus  of  class  R^. 

In  an  analogous  manner,  Pcl2  *s  t*16  expected  probability 
that  an  A-unit  will  respond  to  a  stimulus  of  class  Rg  given 
that  it  responds  to  a  particular  stimulus  of  class  R^. 

In  view  of  these  interpretations  of  these  symbols,  then  the 
expected  bias  d  due  to  all  stimuli  other  than  S^  may  be  calcu- 
lated.  Consider  ns  to  be  equal  for  both  sets.   Each  stimulus 
associated  with  R^  will  add  an  increment  of  value  to  the  S^Ri 
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subset  equal  to  the  number  of  units  in  the  S-fcRi  subset,  PaNe> 
times  the  average  value  that  each  unit  gains  due  to  any  other 
stimuli  of  class  R-^. 

The  increment  of  value  gained  by  this  S-jjRi  set  due  to  a 
stimulus  of  class  R2  is  equal  to  PaNepcl2' 

Hence  the  expected  bias  d"  due  to  all  stimuli  other  than 
Sfc  may  be  expressed  by 

3  =  Vx  -  V2  »  PaNe  (nsr  -  DPcn  -  Pa^e  ?c12   nsr       (84) 
All  stimuli  associated  to  response  units  other  than  R^  and  R2 
contribute  an  increment  of  value  PaNePe^  to  both  R^  and  R2 
sets.   Hence  the  net  value  added  to  R^  or  R2  effectively  cancels 
out.   Then  the  above  expression  for  d  is  general  and  independent 
of  overlapping  among  source-sets,   d  may  be  written  in  the 
following  form. 

V=  PaNe(n8r  -  D(PC11  -  P012)  "  PaNe  Pc12  (85) 
It  is  evident  that  d  will  not  be  a  small  fraction  of  b  as  was 
the  case  in  the  ideal  environment,  but  that  d  will  be  propor- 
tional to  n3  ,  depending  on  the  difference  of  Pc-n  and  pCtp* 

If  Pen  y-  pci2»  that  is,  for  classes  of  stimuli  suffi- 
ciently dissimilar,  then  correct  response  will  almost  always 
occur,  provided  ns  is  the  same  for  both  R-sets. 

Assuming  non-uniform  ns  ,  then  for  the  Alpha  system  in 
differentiated  environment 

^Z  ■  paNe  (n8l  PCn  -  nS2  PC]L2)        (86) 
In  the  following  analysis  Pr  will  be  evaluated  in  terms  of  the 
Alpha  system  parameters  for  the  Alpha  system  with  uniform  nSr 
for  all  responses  in  a  differentiated  environment. 
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First  consider  the  sum- discriminating  system. 

^1  if  the  Kth  stimulus  of  the  rth  class,  and 
a  member  of  the  rth  source- set  activates 
the  aj  cell  under  the  condition  that  the  «j 
cell  is  activated  by  the  test  stimulus  S^. 


Let  x?(»j,K)   = 


0     under  all  other  conditions. 

Then  the  bias  component  d^  due  to  the  R^  source- set  is 

Ne       nsr     5B 
dx  *  Z       2-       2.      x*(a«,K) 
j=l     te»l     p=l  J 


N( 


(87) 


e       1JL  °r  „-— - 

».Z       Z      x'U^K)   +     Z-       xr(»4,K) 
j»l     k=lL  r(aj)  J 

where  r(aj)  goes  through  all  response  units  to  which  aj  is  con- 
nected, except  R^  and  R2. 

For  simplicity  in  the  following  analytical  development, 
let  Yj  represent  the  value  gained  by  unit  &*   from  all  stimuli 
( the  sum  over  K ) .   Then 


n 


J 


r^r 
k*l 


(»1,K)  +  ^   xr(aj,K) 
r(aj) 


(88) 


The  variance  of  d^  given  S^,    a  fixed  member  of  class  R^, 
and,    assuming  that  the  values  of  different  A-units  are  inde- 
pendent,   is 


Ne  ngr 

^(di/t)  SI     cr2Z. 
j»l  k«l 

•  Z    cr2  (Y-) 


x'(ai,K)  +     Z        xr(ai,K) 
J  r(a<)  J 


(89) 


Y«  *  0  with  a  probability  of  1  -  Pa,    that  is,    if  the  unit  aj  is 
not  activated  by  S^.      Y*  *  1  with  a  probability  of  Pa.      The  con- 
dition expectation   of  Yj   given  that  S^.   activates  aj  is  denoted 
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by  E(Yj/t),  the  conditional  second  moment  is  E(Yj2/t),  and  the 
conditional  variance  ,^r2(Yj/t). 

The  first  and  second  moments  and  total  variance  are 


E  Yj  =  Pa  E(Yj/t) 


E  Yj2  ■  Pa  E(Yj2/t) 
CT2    (Yj)    *  E  Yj2   -    (E  Yj)2 
Substituting  in  terms  of  the  conditional  expectations   of  Yj   and 
introducing  the   second  and  third  terms  which  sum  to  zero  yields: 
CT2    (Yj)   =   Pa  E(Yj2/t)    -    Pa(E  Yj/t)2  +    Pa(E  Yj/t)2 
'    [P.(E  Yj/t)]2 
=  Pa  CT2    (Yj/t)   +   Pa(l  -   Pa)    E(Yj/t)  (90) 

In  order  to  simplify  notation,    let  the  conditional  value 
of  Yj   given  t  be  represented  by  Yj,    then  E(Yj/t)    =  E(Y\i), 
E(Yj2/t)   -  E(Yj2),    etc. 

For  different  exposures  the  contribution  made  to  the  con- 
ditional Yj    (given  that  aj  is  activated  by  S^)   is  independent. 
Then  the   sum  over  K  is   independent   of  the  variance   of  Yj,    assum- 
ing nar  to  be   fixed,    then   the  variance   of  Yj    is 

(J"2   (Yj/t)  =*  nSr  cr2  [x'UoD  ♦    2"      **<»Jfl) 

r(aj) 

•  ns     cr2     x'U^l)     +cr2     CL       xr(a1,l 
L  J      J  [r(aj)  J 

Before   actually  evaluating  CT2{di/t),    the   above  variances  must 
be   evaluated. 

The   expected  value   that  unit  aj  will  gain  on  the   average 
due   to  a   stimulus   of  class   one  given  that  »j  is   activated  by 
st#    is  Pctl(j)*      Tne   same  type  of  reasoning  as  was  used  in 


.) 


(91) 
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deriving  E(Pai2)  can  be  used  for  the  second  moment  of  Pcti(l) 

which  results  with  E(P2cti(tj  )j.)  s  pctl(j)  (see  Pa8e  22)#  Then 
it  follows  that 

I  x'Uj,!)]2 


and  CT 


CT*   [x'Uj.l)]  *  E  [x'U^2 

=  pctl(j)  "  Pctl(j) 
2 


(92) 


Z"   x^a^l) 
r(aj) 


=  2T   cr2  ^(a^l) 
r(aj)     L    «»   _ 


I 


-  P* 


ctl(j)  "  rctl(j) 


(93) 


Making  the  proper   substitutions,   the  variance   of  di  given  t    is 

Ne         ' 
CT2   (dx/t)   -Z 


3-1 


Pa  ns„cr 


:'(aJfl)   +    Zl        xr(aj,l) 
r(aj) 


r(aj) 


"N 


+  Pa(l  -   Pa)  nSr2^ 

=  I    [Pa  nar  Pctl(j)   -   pctl(J)2  ♦     Z    .frotHj)   -   pctl(j)7 
3=1 l  L  r(aj)v  ', 


+  Pa(l  -  Pa)   ns. 


P^l(3)  +  (rtaj/ctr(j))! 


+  2  pctl(j)     ^    ,    pctr(j) 
r(aj) 


(94) 


?    ,   Pctr(j)2 
r(aj)  „ 

Now  assuming  that «  Pctx/4^     where   Pc^x(j)   is 

NRa  -  1 

the  mean  value  of  Pc  measured  for  unit  a-j  between  stimulus  S^ 
and  all  stimuli  of  classes  other  than  1  and  2. 

CT2  (d]/t)  *  Pa  n8r  Ne  [pcti  ?  Pctl2  -  ^"j2  (pctl> 


♦  (NRa  -  i)  [potx  -  Pctx2  -    cr/  (Pctx) 
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+  Pa(l  -  Pa)  nSr2  Ne  [Pctl2  ♦   ^2(Pctl)  ♦  (NRa  -  l)2 
x  (Pctx2  +  crf{?ctx)   +  2(NRa  -  l)(Pcti  pctx  ♦  6>    <95> 
where  the  cross  product  term  6  is  assumed  to  be  negligible. 
pctl(j)»  ttie  probability  of  the  aj  unit  responding  to  both  Sj. 
and  S^,  will  constitute  a  distribution  over  j  whose  variance  is 
rf\      (Pc+ji^  Similarly,  01  pCfcT  is  the  variance  associated 
with  the  distribution  of  Pctx(1)  over  the  set  of  A-units.   The 
variance  cTj   (pc*i^  and  ^2  (Pctx^  w111  De  considered  as  em- 
pirical values  which  are  to  be  measured  for  any  particular  case 
in  question,  since  they  are  not,  as  yet,  yielded  to  an  analytic 
approach. 

The  values  of  Pc  result  in  a  crude  approximation  to  a  normal 
distribution.   An  estimate  of  the  standard  deviation  is  that  it 
would  be  equal  to  half  the  expected  value  of  the  variable.   The 
results  of  an  experiment  conducted  by  Dr.  Rosenblatt  resulted 
in  showing  that  the  above  was  a  conservative  estimate. 

Now  consider  St  to  be  any  stimulus  of  the  first  class. 
The  expected  value  of  d-^  given  S^  under  these  conditions  re- 
sults in  the  modification  of  the  previous  E(d1/t)  by  a  factor 
of  Pa,  the  probability  that  St  will  be  activated,  in  the  follow- 
ing manner. 

E(di/t)  =  n8r  %     Pa  Pctl( j)  +  Z   Pa  Pcti(j) 


3T» 


=  pa  nsr  Ne 


r(aj) 

Pctl  +  (NRa  "  *)  pctx 
Prom  formula  (52),  the  total  variance  of  d^  in  terms  of  the 

conditional  variance  is 


(96) 
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cr2  (di)  =  E^U^t)  +  <rt2  (e  dx/t) 

=  Pa  nSr  Ne  [Pen  -  Pen2  -  <^s2(PCll)  "  <T?   (PCll) 

+  Pa(l  -  Pa)  nSr2  Ne  [Pcn2  +  ^s2(PCll) 

♦  ^(Pc^)  +  (NRfl  -  l)2  (Pcix2  +  ^82(Pclx) 

+  ^rj2(Pcix)  +  2(N*a  "  1)(pcn  pcix) 


2  ir> \  j.  /«n   _  -1x2.-2. 


♦  Pa^  ns/  Ne2  CT.«  (PC11)  +  (NRa  -  I)*5  0V<pcix> 
+  2(NR.  -  1)£  (97) 

where  £rr'g2(Pc,,)  and  cr"a  (Pcir^  represent  the  variances  of  Pen 
and  Pcix*  respectively,  taken  over  all  test  stimuli  St  of  the 
set,  and  C   is  the  covariance  of  PctiPctjc  which  will  be  assumed 
to  be  negligible.   The  variances  with  subscript  of  S  may  be 
considered  as  empirical  variables  to  be  measured  for  the  case 
in  question.   This  variance  of  Pc*.n  depends  on  the  shape  of  the 
stimuli  of  class  one.   If  the  stimuli  of  this  class  are  all  the 
same  and  uniformly  distributed  over  the  infinite  retina  of  the 
sensory  system,  then  Pcti  W^-H  De  Identical  for  any  stimuli  of 
the  class  chosen  as  test  stimulus,  and  its  variance  is  zero. 
However,  if  the  stimulus  of  the  given  class  varies  widely  in 
shape  and  its  distribution  on  the  retina,  the  variance  of  Pcti 
may  be  considerable.   The  variance  of  the  bias  component  dg  of 
the  R2  source-set  will  be  equal  to  that  of  the  R^  set  given  by 
equation  (71)  with  PCll  replaced  by  Pcig* 

Now  the  probability  for  correct  response  of  the  Alpha  Per- 
ception with  sum  discrimination  under  differentiated  environment 
can  be  written  as  follows: 
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?. 


r<Z> 


1  -   (1  -  Pa)Ke 


^_y  Vt2/2  dt 


oo 


where  z .  !*k:<°»:.^.:.!;Hl/..!^       (98, 

fa"2  (dx)  +  <r^  (d2) 
Now  examine  Z  of  the  above  expression.   The  numerator  of 
Z  is  proportional  to  ns  ,  and  the  square  of  the  denominator 
contains  two  components  each  of  which  contains  two  additive 
components,  one  proportional  to  ns  and  the  other  proportional 
to  nSr2  for  a  given  Perceptron. 

cl  ♦  c2  nsr 
Thus  Z  takes  on  the  form  of  -- which  can 


Cl    - 
...  +  c2 


f C3  nSr  +  C4  nSr2 


nsr 
be  written ,   where  C's  are  constants. 

5.  ♦  c* 

Consequently  Z  will  approach  a  limit   of  -7 —   as   the  number 

fc7 

of  stimuli  associated  to  each  response,  ns_,  increases.   The 
importance  of  this  is  that  the  Perceptron  will  approach  a  better 
than  chance  limit  for  probability  of  correct  response  with  in- 
creasing experience. 


Alpha  Perceptron  for  Mean  Discrimination 

Before  studying  the  results  of  Pw^")*  the  probability  of 
correct  response  for  mean  discrimination  will  be  considered. 
The  expected  net  bias  B"  for  the  mean  system  is  equal  to  S  for 
the  sum  system  divided  by  the  number  of  units  activated  on  the 
average  in  any  source-set  by  any  stimulus,  namely,  PaNe.   Then 
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B  =    [l  -    (nSr  -    l)(PCll  -   Pc12)    "   Pc12] 
The  variance  of  d^  given  S^  for  this   system  is 


(99) 


er2  (dx/t)  =  ari 


(100) 


j=l 
N*l(t) 

where  Na _(4.\  s  the  number  of  A-units  activated  by  Sf.  in  the  r 
source-set,  and 


E(dx/t)  =  E 


I 


J 


N*i(t) 


*  I   PaK  (1  "  Pa)Ne"K  f-    -""    (101) 

where  PaK(l  -  Pa)N®~K  ■  probability  that  the  particular  combina- 
tion occurs  in  which  only  K  out  of  Ne  units  are  activated  by  S^. 
K     E  Y£ 


A 


=  the   average   value   an  A-unit   in  the  K  set  gains 


by  activation  from  St.      For  a  given  partition  K,    Ne  -  K,    there 

N«  -  1       <Ne  ~  !U 

are  (J*  ,  )  ■ =  number  of  different  ways 

*  "  X     (K  -  l)l(Ne  -  1L)1 
that  K  -  1  active  units  can  be  selected  from  Ne  -  1  units. 

Ju  =   the  class  of  all  possible  partitions  of  the  K  and  Ne 
into  K,  Ne  -  K. 

The  possibility  that  K  m   0,  that  is,  no  unit  responds  to 
St,  has  been  excluded,  so  that  k  can  range  from  1  to  Ne.   Then 

E(d!/t)  =  Z"  (k6.-!1)  P«K(1  -  Pa)N°"K  f    *~£ 
K=l  A   X  J£i       K 
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K, 


K    ^   Ne   .    .   Pa^(l  -  Pa) 

=1  E  *  I,   ^  *  ' K 

/■I  Nft    K=l 


Ne-K 


(102) 


Now  consider  the  second  summation.   The  probability  that 
the  particular  combination  In  which  K  =  0  is  (i  -  Pt)  *.   If 
K  ■  0  were  admitted  the  sum  that  all  combinations  would  come  up 
is  =  1.   Thus  the  above  sum  is 

£  p.Kd  -  p.)Ne"K  <£•>  =  i-  (i  -  ?.)"• 

Since  Ne  is  large,   then  the  sum  is  approximately  =  1,    in  which 

case 

^0     E  Yj  1     ?f-  _ 

E(d]/t)   =  Z_    -----  »  —  Z-      EL  (103) 

^-1     Ne  Ne   j«l  J 

and  the  £- index  may  be  replaced  by  the  J-index  since  the  sum 
reduces  to  the  average  value  of  an  average  set* 

Similarly,  the  second  moment  of  d]_  given  that  aj  is  acti- 
vated by  St  will  be 


E(d12/t)    *  E 

.  N*Kt) 

2 

2     J£ 

~Ng 
Z_     Y«8 

N                2 
_    *l(t) 

f   E 

Ni 


i,  J»l    J 


N, 


Kt) 


(104) 
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where     E 


£ 


»l(t) 


z 


*    &    P.BU  -  P.)" 


E  Y£ 


2 


± 


i.J-1 

N, 


•iYJ 


'•i(t) 

To  simplify  notation,    let 

K 


1th 


.N.-K 


p.*  (i  -  P.)  e      =  *' 

Then  the  conditional  second  moment  will  be 

E(dl2/t)   ■  J;   -    (  Z    E  Y£2  *  TT        E  Y>  Yr)  (105) 

Introducing  a  second  and  third  term  which  sum  to  zero  yields: 


K 


JiE^8-i(E^)8+  ^(,«): 


2" 

p» 

^ 

K2    „ 

Z 

Pf 

,# 

K2 

E  Y^  E  Yr 


ZL  cr2  &£)  ♦    Z     E  Y/  E  Yr 


.2 


Evaluating   the   sum  over  yy  gives 


(106) 


Ne       p» 


M»   -   1.  J» 


«*i2A>=F,   n  <k-\  >£. <r*(*jft 


K=l 
P1 


^-1 


r  -*  ( 


K=l  K 


Ne   -   2 
2    XK   -   2 


(Z_     EYj)2  -   Z     (E  Y,)1 


B.   p! 

K=l  K2 


L-     (E  Yj)' 
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(107) 


where  the  above  quantity  in  brackets  of  the  second  term  repre- 
sents the  terms  of  the  cross  products  resulting  from  the  second 

term  of  the  previous  expression. 

/ — 
Due  to  the  definition  of  E  Yj,  aj  must  be  active  and  cross 

product  terms  specify  that  aj  £   «j_,  so  that  a^  must  be  active 

also.   Thus  at  least  two  units  must  be  active.   Then  the  number 

/Nfl  -  2. 
of  different  combinations  is  (R  _  2  ). 

For  the  squared  term  (E  Yj)2,  only  aj  must  be  active  so  that 

the  corresponding  number  of  different  combinations  for  this  term 

-  fr.Vi. 

The  conditional  second  moment  of  d^  may  be  written  as 
follows: 


E 


KpI  k2     x  £*i 
jal    J    K«l  K2   e 


I 


e 


X   (E  Yj)' 


4-,  ~5  lK  -  1  ;    fc,  IS  * 


K*l  K2 


Ne  -  2 
lK  -  2 


J=l  K=l  K2 

Simplification  of  the  coefficients  of  the  terms  proceeds 
as  follows: 

^    H  (nq  - 1}  _  ^    j'  .._iN.!.:il:_._ 

K=l  K2   K  "  X    '  K=l  K2   (K  -  1)1 (BQ  -  K)J 


(108) 


P'(Be)l 


"t 


K-l  NeK(K)i(Ne  -  Kl )    Ne   K*l  K 


P1 


(109) 
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N«        pi  Ne        p  K/t        p    \Ne-K 

-  ra   v±       *V  ,N, 


Let         Q»E       -    (Je)  a  ^  (»e, 

K=l     K  K«l  K 


^        P'      N«   -    1  1 

Therefore       £_       --    (Ke     .    )  =  —  Q 

fcl     K2     K  "    X  Ne  . 

Ne 
where  Q  &  ZL     P.(l  -  Pa)Ne~K  (|e) 

1 

"  PaNe 
For  the  coefficient  of  the   second  term: 

5*        P*    ,Ne   .  g.        \      *'  (Ne   -   2)! 

K«2     K2     a       *  K=2     K2      (K  -   2)1  (Ne  -   K)'. 


N§  P'(K  -   1)  N 

K»2      K  Ne(Ne   -    1) 


1  *ft  1 


X      (1   -    -)P*    {le) 


Ne(Ne   -   1)      K=2  K 

1 

■ (1  -  Q)  (110) 

Ne(Ne   -   1) 

A  combination  of  the  above  two  terms  yields  the  coeffi- 
cient of  the  third  term: 

Ne        p  K(i  -   p   )Ne-K  Na.       p  K/,        p   %Ne  -K 

i~       :?-;._I_l?: (Ne   "   1)   .  2"       ---------?- (Ne   "  2 

K*l  "k2~  K  *   1  K=2  "k2~  K  "  2 

1  1  QL-   1 

=  _.  q (1  -  Q) (111) 

Ne  Ne(Ne   -   1)  Ne(Ne   -   1) 

Substituting  these   coefficients   in  the  expression   (108)   for 

the   second  moment  of  d]/t  gives 


) 
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N, 


E 


(d^/tl^Zo"2    (Yi)    -«♦   <Z     E  Yi)2    ( ) 

j»l  Ne  j=l  Ne(Ne   -   1) 


/£ 


J-i 


(E  Yj)2 


QNe   -   1 
Ne(Ne   -   li 


(112) 


Now  the  total  variance  of  d^/t  may  be  calculated. 

If 


CT2    (dj/t)   =  E(d!2/t)    -  [Etdj/t)]' 
Q    !t      0    ^  1  -  Q 


-f^    (Y-)   +    ( )    (Z    E  Y,) 

Ne    j-l  J  Ha(Ne   -    1)         J«l  J 

( — )  Z  (e  Y.r  -  (-r  (Z  e  Y<r 

Ne(Ne    -    1)       Pi  J  Ne  j.l  J 


Combining  terms: 


I 


N, 


^(dj/t) 


=   -     Z  (T^Y.)   +   — - ? Z_ 

Ne      J-l  J  Ne(Ne   -   1)      J.1 


(E'Tj) 


E  Y,)' 


Ne2(Ne   -    1)    '>&  " 


(113) 


Substituting  in  the  required  expressions  which  were  calcu- 
lated for  the  sum  system  results  in  the  following: 
CT2   (dx/t)  =  Q  nar  Pctl  -  Pctl2  -  erf   (Pctl) 

♦  (NRa  -  1)  (Pctx  "  pctx2  "  °~i2   ^ctx)) 


Ne  Q  -  1 


t2  ♦  <rf   (pC4.n) 


'ti 


n  2  P 

Ne  -  1       L 

♦  (NRa  -  I)2  (PCtx2  ♦  CT*    (PCtx)) 

♦  2(NRa  -  1)  PCtl  fHs^ 
Ne  Q  -  1    , 


«tl' 


N   -  1 


nSl 


pcti  +  (NRft  -  1)  Pct: 
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*  Q  n 


sr 


?ctl  -  Pctl2  -      ^~j2   <Peti> 


♦    (NRfl    -    1)(? 


tx  "  P°tx     -    ^j      (pctx 


Ne  Q  -   1 
Ne   >    1 


+ n8 


:tx 
o-f  (P 


ctl' 


~?  <pctx} 


(114) 


Letting  Sj.  by  any  stimulus  of  the  class   one  the  conditional 
expectation  of  d^/t  and  its  variance  are  respectively: 


"    I)2  <rs2(Pclx> 


E(d]/t)   =  nSr     Pctl  +    (NRa  -   1)    Pctx 
CTt2    (E  dx/t)  =  nSr2[Crs2(Pc1i)    +    (Nr~ 

+  2    (NRa  '-   lKJ  (115) 

where  as  before    6    represents  the  covariance  of  Pen  Pcix* 
which  will  be  assumed  negligible.      Making  the  proper   substitu- 
tions from  the  previously  derived  expressions   and  assuming  the 

1 
approximation  of  Q*^ ,    the  total  variance  of  d^  is  given  by 


the  general  equation 
•  2   i a.  \  _  v.  n-2 


PaNe 


0-a   (dx)  -  Etcr^    (dx/t)   +   0-f   (E  di/t) 
ns* 


PaNe 


(Pen  -   f*%3*  -   tf~s      (Pclx)    -    (7~f   (PC11) 


+   (NRa  -   l)(Peix  "   Pcix2  - 


2   (P       ) 

8       ^Clx' 


<rf  (pc 

i 

P.(N6   :    1) 


lx 


>) 


Ne   -    1 


*sr2     <Tj2  (PC11) 


♦   (Mr.  -  1)    (Tj2   <P0lx>] 


+  n. 


(116) 


^(Pcn)   +    (NRa  -   D2    J",8(P0l3C) 
The  standard  deviation  of  d  for  the  mean  system  is  equal  to 
CT^M)  ■  /cr2   (dx)  +    <r-2   (d2) 
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where  cr2   (c^)  is  equal  to  Cf       (di)  with  Pen  replaced  with 

pci2' 

Then  the  probability  of  correct  response  for  the  mean  system 

is  given  by 


VnM) 


1  -  (1  -  PJ 


Nq 


(ns  -  1)(PC11  -  Pci2)  +  (1  -  Pc12) 

where  Z  -  -  —  --- -.--=---- -- -—  (117) 

IcT*   (dj.)  +  a-2   W 

To  further  study  the  capacity  and  capabilities  of  the  Alpha 
Perceptron  P<j,  the  probability  that  two  stimuli  associated  to 
two  different  response  units  during  the  learning  period  will  be 
correctly  discriminated  in  the  test  period. 

The  equations  for  Pr  gave  an  analytical  indication  of  the 
correctness  of  response  for  one  test  stimulus,  while  P^  indi- 
cates the  correctness  of  discrimination  of  stimuli. 

If  symbols  were  redefined,  then  the  correctness  of  response 
to  S^2»  a  test  stimulus  associated  with  Rg,  could  be  determined. 
Assuming  the  Pr's  to  be  Independent,  then  the  probability  of  cor- 
rect discrimination,  P^,  would  be  equal  to  the  product  of  the 
Pr's  for  S-t-jL  and  S^g. 

With  this  idea  in  mind,  let  some  symbols  be  examined 
closely.   By  convention,  the  S^Ri  subset  is  the  set  of  units  in 
the  Ri  source-set  which  are  activated  by  S^.   Then  in  this  sub- 
set each  unit  gains  one  unit  of  value  due  to  one  exposure  of  S^.. 

The  expected  net  bias  d"  was  due  to  all  stimuli  other  than 
S^  which  (when  assuming  nar  to  be  uniform)  means  that  (nar  -  1) 
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stimuli  were  associated  with  R^  and  nar  stimuli  were  associated 
to  R2.   Then  the  resultant  mean  value  per  unit  due  to  the  unbal- 
anced association  toward  Rg  is  equal  to  Pci2*   This  can  be  seen 
since  Pci2  ^y  be  interpreted  as  the  value  gained  on  the  average 
by  a  unit  of  the  S-^Ri  set  by  a  stimulus  associated  with  Rg- 

It  follows  that  the  net  expected  reinforcement  bias  due  to 
St  is  equal  to  1  -  Pci2* 

By  slight  modification  various  degrees  of  relationship  can 
be  obtained  between  the  two  known  stimuli  S^  and  S>to»   where 
Stj  denotes  the  test  stimulus  associated  with  R^,  and  S^g 
represents  the  test  stimulus  associated  with  R2« 

If  the  unbalanced  reinforcement  bias  toward  R2  pci2  ls  re~ 
placed  by  Pct  ^     (the  expected  value  of  Pc  between  S^  and  St2)» 
then  the  resulting  equation  for  correct  response  will  be  correct 
for  assuming  that  S^o  corresponds  to  one  stimuli  associated  with 
R2«  How  another  equation  for  correct  response  will  assume  that 
St  Is  the  R^  test  stimulus  and  S^_  the  oppositely  associated 
R2  stimulus.   This  equation  will  be  denoted  by  Pr(tl)*   Pj,(t2) 
will  represent  the  corresponding  equation  in  which  S-^g  is  the 
test  stimuli  of  R2,  and  S^  is  the  oppositely  associated  stimu- 
lus of  R2. 

Assuming  the  Pr's  to  be  Independent,  the  probability  that 
both  known  stimuli  are  associated  correctly  is  the  product  of 
individual  probabilities  of  correct  response  which  is  equal  to 
the  probability  of  correct  discrimination,  P^. 

Pr(tl)  Pr(t2)  =  Pd  d*8* 

Thus  when  S^  and  St2  have  a  specified  difference  measured  by 
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Pc.j.  £  and  Pc^2^1,  Pa  represents  the  correctness  of  response  of 
both  known  stimuli  during  the  test  period. 

Now  if  Pctit2  ~  Pct2ti>  and  Dotn  stimuli  are  the  same  size, 
then  the  equation  for  Pr(t2)  *s  *he  same  as  pr(tl)  witn  pcn  re- 
placed by  Pc22  in  the  z  expression.  Subject  to  the  restrictions 
of  uniform  stimuli  size  and  uniform  nSr»  Pd  can  be  written  for 
the  sum  system  as 

2  rr 
PaNe    (nsr  -   D(Pcrr  -  -fc] 

crd(r) 

and  ^d(T")i   =  positive   square  root   of  equation   (97) 

CTd(£)2  =  positive   square  root   of  equation   (97)   with 

P<511  rePlaced  °y  pci2' 
Similarly,   for  the  mean  discriminating  system: 


Pd(E,  ■ 
where     Zr  ■ 


*/"*  «  (V**  at 


'  -co  -oo 

,o)    +    (1    -    P 


ctit2 


(119) 


PdW> 

where     Zp 


'Z/-t2/2dt/[Z2^-t2/2 


dt 


(120) 


(n3r  -   D(PCrr  -  Pci2)   +   (1  -   pctit2> 

^(//)r 

and   ^dt^v)!  *  positive  square  root  of  equation  (116),  and 
CTd{/^)2   s  positive  square  root  of  equation  (116)  with 

Pcll  rePlaced  °y  PC22* 
It  may  be  noted  that  if  Petit2  Decomes  increasingly  large 
1  -  pctit2*  *^a  exPected  known  reinforcement  bias  approaches 
zero.   Provided  there  were  no  other  stimuli  of  other  classes  for 
reinforcement,  the  expression  for  Z  would  approach  an  asymptote 
chance  response. 
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The  above  equations  assume  that  S^  can  be  any  stimulus  of 
the  class  1  set  and  S^g  may  likewise  be  any  stimuli  of  class  2. 
That  is  to  say,  all  the  stimuli  within  a  class  have  a  high  cor- 
relation with  each  other.  With  this  consideration  Pen*  Pcl2* 
PC22»  etc.,  are  used  in  the  numerator  for  Z  rather  than  the 
specific  values  Pctil*  pcti2»  pct22»  etc*   However,  when  Pen 
is  not  typically  the  expected  value  Pctil*  then  the  more  specific 
values  must  be  used  to  be  representative  of  the  particular  situa- 
tion.  For  example,  class  1  might  consist  of  a  set  of  circles 
and  class  2  a  set  of  ellipses.   If  a  typical  circle  of  class  1 
was  chosen  as  S^,  and  an  ellipse  of  nearly  circular  form  was 
chosen  to  be  S*2,  the  value  of  Pct  tg  is  close  to  unity.  If 
such  a  selection  is  picked  for  computation,  Pct-,1*  Pctnx'  e*c., 
should  replace  Pen*  ^c-^x  *    eto.,  and  in  addition  the  condi- 
tional variance  t  would  replace  CT '(d).  These  stimuli  are  no 
longer  typically  selected  from  the  class  in  which  they  are 
members. 

In  classes  such  as  squares  versus  circles,  the  maximum 
value  of  Pci2  was  »63  in  which  the  centers  of  the  two  figures 
were  Identical,  according  to  a  simulation  experiment  performed 
by  Dr.  Rosenblatt.   In  this  type  of  class  any  square  or  any 
circle  may  be  taken  for  discrimination  and  there  is  no  need  to 
modify  equation  (120). 

Thus  far  it  has  been  shown  that  the  Alpha  Perceptron  can 
perform  correct  discrimination  between  known  previously  rein- 
forced stimulus  with  Increasing  number  of  nSr. 
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Consider  the  effect  of  the  known  reinforcement  bias 
(1  -  Pci2)  with  large  nsr.   Unless  the  term  (Pen  -  pci2^  is 
extremely  small,  the  known  reinforcement  bias  will  have  negli- 
gible effect.  As  nSr  becomes  large  enough  regardless  of  the 
size  of  (Pen  "  Pci2^  the  known  reinforcement  bias  for  any  par- 
ticular stimulus  during  the  learning  period  becomes  negligible. 
This  means  that  the  system  will  respond  just  as  accurately  to  a 
test  stimulus  which  has  never  before  been  presented  or  rein- 
forced.  This  demonstrates  that  the  system  approaches  a  condi- 
tion for  which  Pr  is  better  than  a  chance  level,  even  for  a 
stimulus  of  zero  known  bias.   Therefore  the  ability  of  the  Per- 
ception to  form  perceptual  generalizations  has  been  shown. 

Expressions  for  Pg,  the  probability  of  correct  generaliza- 
tion, are  obtained  from  the  equations  for  Pr,  with  zero  re- 
inforcement bias  of  the  test  stimulus.   Then  under  the  condi- 
tions of  uniform  n3r  and  fixed  stimulus  size,  expressions  for 
Pp.  are  the  following. 

*  .L,  (/-*/*  dt 


pg(D  s 


1  -  (1  -  pa)Ne 


.     f~2rr    L 


o=> 


Pa  Ne  nar   (Pen  -  Pcip) 
where     Z  = i-___xx tt.  (121) 


PS(/<) 


1  -    (1  -   Pa) 


Ne 


4*  (*  £  -t2/2 « 


nSr    (PC11  -   Pci2) 
where     Z  m (122) 

and     0~^X)    and   &~&\m}    Bre  tne    same  as   equations    (97)    and   (116) 
respectively. 
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Plate  IX  illustrates  the  results  of  the  performance  equa- 
tions of  Pr  and  Pg  for  the  mean-discriminating  Alpha  system. 
The  parameters  of  the  system  such  as  Ne*  w,  etc,  were  modeled 
after  the  results  of  a  particular  square-circle  discrimination 
simulation  experiment.   Pg  may  be  interpreted  as  the  probability 
that  any  circle  or  square  placed  at  random  within  the  bounds  of 
the  experiment  is  correctly  recognized. 

Three  pairs  of  curves  are  given  in  Plate  IX.   One  pair  of 
curves  (Pr  and  Pg  versus  nar)  is  for  a  system  with  Ne  =  100 
units.   The  other  two  pairs  of  curves  are  with  Ne  =  200  and 
Ne  ss  500.  In  all  cases  Pg  starts  slightly  above  a  0.5  level  for 
nSr  small  and  approaches  an  upper  asymptote.   For  small  nSr  the 
known  reinforcement  is  zero  for  Pg.   However,  as  the  number  of 
stimuli  increases,  this  term  has  negligible  effect  so  that  Pg 
approaches  its  upper  asymptote.   The  curves  for  Pr  with  nsr 
small  are  nearly  unity  since  the  known  reinforcement  will  have 
little  interference  from  bias  due  to  other  associations.   As  na 
increases,  Pr  approaches  the  Pg  asymptote  which  can  be  made  close 
to  unity  by  increasing  the  number  of  effective  A-units.   Pr  ap- 
proaching Pg  indicates  that  the  specific  reinforcement  bias  be- 
comes increasingly  negligible  in  comparison  to  the  steadily 
increasing  bias  due  to  the  difference  in  Pc's.   Both  Pr  and  Pg, 
in  the  limit,  converge  to  the  same  asymptote. 

Plate  X  shows  three  pairs  of  curves  with  the  probability 
of  error  1  -  Pg  versus  N^.   The  solid  curves  represent  M  =  0.5 
and  the  broken  curves  represent  a  system  with  disjunct  source- 
sets.   Prom  the  curves  It  is  evident  that  as  the  number  of 
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response  units  increases,  the  size  of  the  system  in  terms  of  the 
number  of  A-units  increases  rapidly  to  attain  a  given  probabil- 
ity criterion. 

The  variance  for  large  systems  increases  with  Nr2  for  a 
fixed  a).   For  small  systems,  with  disjunct  sets,  the  variance 
increases  with  Nr.   As  also  can  be  seen  from  the  curves  over- 
lapping source-sets  are  desirable  for  small  systems,  and  dis- 
junct source-sets  are  desirable  for  large  systems. 

,  CONCLUSION 

The  first  analysis  of  this  report  was  concerned  with  the 
performance  of  the  Alpha  and  Gamma  Perceptrons  under  ideal  en- 
vironment conditions.  Although  the  major  goal  of  using  the 
random  stimulus  constraints  was  to  achieve  an  analytical  model 
for  further  analysis,  several  characteristics  of  the  Perceptron 
resulted.   It  was  found  that  the  Alpha  systems  learned  to  re- 
spond with  better-than-chance  accuracy  for  previously  reinforced 
stimuli.   The  probability  of  correct  response  decreased  to  a 
chance  level  with  increasing  number  of  independent  stimuli 
associated  with  each  response  unit.   Correct  response  for  the 
Gamma  system  was  independent  of  the  variation  in  the  number  of 
stimuli  associated  to  each  response  unit. 

Mean  discrimination  was  superior  for  the  Alpha  Perceptron. 
For  the  Gamma  Perceptron  the  probability  of  correct  response  was 
Identical  for  both  methods  of  discrimination.   Of  course,  with 
ideal  environment  there  was  no  basis  for  generalization  in 
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recognition  of  non -previously  reinforced  stimuli  since  no  rela- 
tionship among  stimuli  existed. 

With  differentiated  or  non-random  environment,  the  perform- 
ance of  the  Alpha  Perceptron  with  both  methods  of  discrimination 
was  investigated.   Stimuli  within  a  class  were  correlated,  of 
which  Pc  was  a  measure  of  stimuli  relationship. 

Under  these  conditions  the  probability  of  correct  response 
for  stimuli  of  a  class  approached  a  better- than- chance  asymptote 
with  increasing  number  of  stimuli  associated  to  a  response  unit. 
This  asymptote  approached  one  for  large  enough  number  of  A-units. 

If  the  Perceptron  was  actually  to  indicate  that  it  could 
adapt  to  its  environment,  then  it  must  be  capable  of  generaliza- 
tion.  That  is,  after  sufficient  learning,  it  should  be  able  to 
recognize  stimuli  of  a  class  even  though  they  had  never  been  pre- 
sented before  to  the  system.  With  stimuli  within  the  classes 
being  correlated,  generalization  was  not  only  possible  but  also 
the  probability  of  correct  generalization  converged  to  the  same 
asymptote  as  Pr.  In  other  words,  it  could  be  concluded  that 
after  considerable  experience  the  Alpha  Perceptron  performed 
just  as  accurately  to  the  recognition  of  stimuli  which  had  never 
before  been  shown  to  the  system  as  to  stimuli  which  had  been 
reinforced  previously. 
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This  report  was  written  to  investigate  and  evaluate  the 
statistical  analysis  of  the  Perceptron  proposed  by  Dr.  Rosen- 
blatt, of  Cornell  Aeronautical  Laboratories.   It  was  desired  to 
determine  the  feasibility  of  the  self-adaptive  cognitive  system 
presented  in  reference  16. 

The  evaluation  was  carried  out  in  coordination  with  elec- 
tronic signal  recognition  research,  Project  264  of  the  Engi- 
neering Experiment  Station,  Kansas  State  University.   Project 
264  shares  the  same  basic  idea  of  Dr.  Rosenblatt's  work  on  the 
Perceptron.   This  is,  both  projects  deal  with  a  system  capable 
of  learning  the  statistical  characteristics  of  the  input  ensem- 
bles.  However,  the  two  projects  are  quite  different  in  their 
mechanisms  necessary  to  accomplish  their  goal. 

A  statistical  analysis  was  employed  in  order  to  determine 
the  characteristics  and  performance  properties  of  several  Per- 
ceptron models.   The  expressions  representing  the  accuracy  of 
recognition  with  various  sets  of  system  parameters  specified  were 
illustrated  by  the  graphs  given  at  the  completion  of  each 
analysis. 

Under  the  ideal  environment  conditions  the  Perceptron 
systems  investigated  in  this  report  were  capable  of  associating 
a  specific  number  of  stimuli  to  specific  response  units.   How- 
ever, these  associations  could  not  be  retained  as  the  number  of 
stimuli  presented  to  the  system  increased.   In  other  words, 
under  these  conditions  the  Perceptron  systems  were  not  capable 
of  self-adapting  to  the  environment  of  uncorrelated  signal 
ensembles. 


With  uncorrelated  signal  ensembles  there  was  no  basis  for 
generalization.   Mean  discrimination  resulted  in  better  perform- 
ance than  sum  discrimination  for  the  Alpha  Perceptron. 

The  Gamma  system  proved  to  be  capable  of  performance  inde- 
pendent of  the  stimuli  measure.   Correctness  of  response  was  the 
same  for  both  methods  of  discrimination  of  the  Gamma  Perceptron. 

In  differentiated  environment  where  the  stimuli  within 
classes  were  correlated,  self  adapting  to  the  environment  was 
possible.   In  fact,  the  probability  of  correct  response  ap- 
proached a  better-than-chance  asymptote  with  increasing  number 
of  stimuli  associated  to  a  response  unit.   This  asymptote  ap- 
proached unity  for  a  large  enough  number  of  A-units. 

The  Perceptron  was  capable  of  generalization  so  that  self 
adaptation  to  its  environment  was  realized.   With  increasing 
experience,  the  probability  of  correct  generalization  converged 
to  the  same  asymptote  as  Pr. 


